







i 
“A mm ral a 
ea 


ad 
SAL Wndiék and Title Page in|This Number 























The Journal of 
EDUCATIONAL 


PSYCHOLOGY 


Devoted Primarily to the Scientific Study of Problems of 
Learning and Teaching. 








Boarp or EpItTors 


Haroitp Ruae, Chairman. 
J. Cargueton Be. Artaor I.Gates Ropour PINTNER L. M. Terman 


Frank N. Freeman V. A. C. Henmon BEARDSLEY Rumi E. L. THornvixe 








Vou, xix DECEMBER, 1928 $5.00 » year 


70c per copy. 








CONTENTS 


Index and Title-page to Volume XIX (1928). .... ; i 
Tue RELATION OF TESTS OF MEMORY AND LEARNING To Eacu 
OTHER AND TO GENERAL INTELLIGENCE IN A HIGHLY 
SeLectep Aputt Group. Henry E.Garrett. . . . 601 
Tue SAMPLING THEORY AS A VARIANT OF THE Two Factor 
Tuoeory. John Mackie. ..... . 614 
A Strupy oF THE VALIDITY OF THE DowNEYy WILL-TEMPERAMENT 
Test. Robert S. Thompson. . . . - 622 
Sex DIFFERENCES ON ABILITY TESTS IN ‘Art. ‘Alfred 8. ‘Lewerena. 629 
A Nore ON THE CORRELATION OF AVERAGES. Helen M. Walker. 636 
On THE STANDARD ERRORS OF THE MEAN DUE TO SAMPLING AND 
TO MEASUREMENT. C. L. Huffaker and Harl R. Douglass. . 643 
A New APPARATUS FOR PLOTTING AND A CHECKING METHOD FOR 
Sotvinc LARGE NuMBERS OF INTERCORRELATIONS. L. Dewey 





Anderson and Herbert A. Toop er . 650 
A CoMPARISON oF ‘‘APTITUDE”’ AND “TRAINING” TEsts FOR 
Prosnosis. 7.A.Langlie..... 658 
Nores oN ARTICLES IN EDUCATIONAL PsycHOLoGy IN ‘CURRENT 
Issues oF OTHER MAGAZINES. ....... 666 
New PusBLicaTIONS IN EDUCATIONAL PsycHOLOGY AND ‘RELATED 
ee 
Published Monthly Except June to ‘August by ” 
Warwick & York, Inc., 
10 East Centre Street Baltimore, Md. 

















Entered as Second Class matter November 15, 1921, at the Post Office at Baltimore, Md., 
under the Act of March 3, 1879; additional entry as second class matter at York, Pa. 

















Piya. ae 





a - ae 
ee as te 


; 
; 
.§ 








LABORATORY EXERCISES 
IN EDUCATIONAL 
STATISTICS 


With Tables 


by RoBERT LEE Morton 
Professor of Mathematics, College of 
Education, Ohio University 
These exercises, the outgrowth of seven 
years’ teaching experience, provide the 
practice in elementary statistics needed 
to supplement classroom lectures, text- 
book assignments, and class discussions. 
Tables also bound separately. 


Now Ready 


SILVER, BURDETT 
AND COMPANY 


New York Newark Boston Chicago 
San Francisco 








































TEACHERS’ 
HEADQUARTERS 








European 
Plan 


Entirely 
Fire-Proof 


Centrally 
Located 


EDWARD DAVIS 
Manager 


HOTEL RENNERT 


BALTIMORE, MARYLAND 





is the title of a new booklet 
written to help you in Teaching 
the Dictionary. Here are a few 


suggestions of the lessons in- 
cluded : 


come = 
First Dictionary Lessons | 
Relative Position of Letters 

How to Find Words 

What You Find 

Pronunciation 

How to Find Meanings 

Parts of Speech and Meanings 

Unusual Uses of Words 


Synonyms 

The Hyphen, Etc., Etc. 
Copies of this new booklet will be sent 
FREE to teachers upon request. 

G. & C. Merriam Company 
Springfield, Mass. 
Merriam Wetec Dictionaries 
for over 85 years 
Look for the Circular 






































The Journal of 
Educational Psychology 


published , by Warwick & York, Inc., Baltimore, 
Sey tak Pe Tide pepe ond index azo bound 2 


Manuscripts for publication, books, or other 


a gd oqmow, and — items we 
addressed to Haro ugg. 
Editorial Board, 400 W. 118th Street, New York 


Subscribers should notify the publishers of change in 
Se ey Se aS 

i i is 0 Jor 
recsigt of on issue will be entertained unless made within two 


Warwick & York, Inc., 
Baltimore, Md. 

















| mun 
@ 
et 


Poa 


ol 








any 


Uy, 


ay 


altimore, 
ound in | 


r other | 
puld be 
of the 
Ww York 


yable in 

le current 
- 

y month 

issues 90 


e in 
the iste 
for non- 


rithin two 


Cy | 





THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 


Volume XIX December, 1928 Number 9 


———~ 
———————— 

















THE RELATION OF TESTS OF MEMORY AND 
LEARNING TO EACH OTHER AND TO GEN- 
ERAL INTELLIGENCE IN A HIGHLY 
SELECTED ADULT GROUP 


HENRY E. GARRETT 


Columbia University 


The present study’ is concerned with two questions: (1) the 
relation of memory and learning to general intelligence in a highly 
selected adult group; and (2) the presence or absence within a group of 
memory-learning tests of a common interrelating factor. The tests of 
memory and learning include well-known tests of rote and verbatim 
memory, tests of logical or meaningful memory, and tests of the 
speed and accuracy of connection-forming. (These tests are later 
described.) The test of general intelligence was the Thorndike 
Intelligence Examination for High School Graduates which is regularly 
given to the entering freshmen in Columbia College. 

The Subjects and the Tests —During the academic year 1925-1926, 
eight tests of memory and learning were given to more than two hundred 
Columbia College students enrolled in classes in beginning psychology. 
All of these students were of sophomore rank or better, as freshmen 
cannot take psychology. The records of one hundred fifty-eight men 
were found to be complete and usable; of these one hundred fifty-eight, 
forty per cent were sophomores, forty-seven per cent were juniors, and 
thirteen per cent seniors. The average age of the group was 19.4 years 
with an SD of 2 years. For each man there was available a Thorndike 
Test score which was taken to represent his general intellectual capacity. 

Full descriptions have been published of the Thorndike test by 
Wood? and others. Suffice it to say here that this examination, which 





1 This study is one of several supported by a grant from the Council for 
Research in the Social Sciences, Columbia University. 
2 ‘Measurement in Higher Education.” New York, 1923. 
601 








faa FEM atic ite reese . 
3,5 ee APRs tas - o> ee 


602 The Journal of Educational Psychology 


requires about 314 hours to administer, is a searching measure of 
educational achievement as well as of intellectual capacity. The 
Army Alpha, the Stanford-Binet, and other tests, modelled after these, 
attempt to infer native capacity by measuring the accuracy (and the 
speed) with which the testee can carry out fairly intricate directions, 
““educe”’ relations of a symbolic and verbal character, answer questions 
designed to sample his general information, etc. Specific school 
information is brought in only incidentally, and with no thought of 
testing educational acquisition as such. The Thorndike test, on the 
other hand, in addition to doing all of the things mentioned above, 
makes specific demands on the testee’s knowledge of the mathematics, 
science, literature, and history which he has presumably accumulated 
during four years of high school. Hence this examination is clearly a 
test of superior intellect, and is probably the most discriminative 
which we now possess. While the Army Alpha simply does not 
differentiate in the upper levels of intellect (the writer has had every 
man in a class of thirty score above one hundred forty) the Thorndike 
test does so quite sharply. 

The eight tests of memory and learning were (1) digit-span (vis.) 
and (2) digit-spon ‘aud.); (3) paired associates (vis.) and (4) paired 
associates (aud.); (5) “‘logical’’ memory for a selection of fairly difficult 
prose, (6) a Turkish-English vocabulary learning test,' (7) the digit- 
symbol learning test (Whipple), and (8) a Code learning test. This 
last test was an adaptation of the Code Test put by Terman in year 
XVI, Average Adult, of Stanford Revision. In our test, the code was 
shown at the top of the page followed by two samples of code writing 
and a selection of ten lines of difficult prose. Instructions were to code 
as much of the given material as possible in ten minutes; the score was 
the number of words coded. 

Of these eight tests, the first five would be classified conventionally 
as “‘memory” tests, and the last three as “‘learning”’ tests. This 
distinction is probably more convenient than real. To be sure, the 
three learning tests differ from the five memory tests in the motor . 
factor—speed of connection-forming—which they involve. Further- 
more, they probably have more to do with fixation and less with recall 
than do the conventional “‘immediate’’ memory tests in which practice 


-playsaminorrole. But both kinds of tests are concerned with measur- 


ing the ability to make learned reactions; and hence we shall call them 
hereafter memory-learning tests. 





1 Adapted from Foster’s ‘‘ Experiments in Psychology.”’ New York, 1923. 
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The Experimental Results.—Table I gives the intercorrelations of 
ten variables—the nine memory-learning tests, the Thorndike Test 
and Age—together with the means and the SD’s of the different 
distributions. The reliability coefficients are printed in heavy type, 
and except in case of the logical memory test are fairly satisfactory.’ 


TaBLE I.—INTERCORRELATIONS OF EiGgHT MEMoRY-LEARNING TESTS, THE 
THORNDIKE EXAMINATION AND AGE, TOGETHER WITH THE MEANS AND 
THE SD’s or tHe TEstTs 



































N = 158 
| 
wm} @|@!]@| © | @ | @ | (8) | «@) | ao) 
| } 
1. Thorndike examination... . 85;—.11) .18 al .23} .29; .29 .09 .37 .31 
Risknsedececedseeeeunelteres 1.00'—.11/—.20}—.29;—.11) .02/ —.19}) —.22| —.25 
3. Digit-span (vis.)..........|.....|..-.. 68) .57) .03} .00) .03) .15) .20) .10 
SS Oe ee: eee Se 80|—.05) .07;—.12) .10) .13) .15 
ee ee COED CUNELD.. cle ccc adocccclecccelecces | 95) .59) .26 .31 . 34 . 26 
6. Paired association (aud.)...|.....).....|.--..|s0-se {eee 90, .19) .12) .41) .17 
EES, RE ER PI Se See i aadaw en -60 .02 .20 .07 
SETS TSS a Raa Bee NOPER Wier, “ate 95} 29) .57 
9. Turkish-English vocabulary)... .. ES SE POE RO Reet Pe Fees 91) .39 
10. Code learning............. ee eee eee a fre eee ded freer rerre 85 
an sis bksceee bees '85.15)19.43) 9,13 8.36 34.53 21.28 30.47 287.92)161.53/196.00 
dcctédevennadenmens am “ae 1.64) 1.56) 9.53| 9.90 a 47.10) 42.91) 41.01 
| | | 








As is evident from Table I, all of the memory-learning tests are 
positively related to general intelligence (Thorndike) and, with three 
exceptions, to each other. For the most part the r’s are low, the 
average for the table (age out) being .21; this general result fits in with 
those obtained by other workers who have studied comparable adult 
groups. Wissler? obtained r’s of .05 between auditory memory and 
logical memory; .29 between auditory memory and visual memory; 





1In two of the tests, Paired Associates (vis.) and Paired Associates (aud.), 
there was some negative skewness in the distribution of scores obtained from the 
second giving of the test. This was due to the piling up or jamming of the scores 
at the “high” end of the distributions. As a result of this jamming, there was 
considerable curtailment of the second array which served to reduce the self- 
correlation of the tests. Correction was made as in the Army Alpha data (see 
Memoirs of the National Academy of Science, 1921, Vol. XV, pages 629-632) by 
use of formulas which redistribute the jammed scores on the assumption that the 
true form of the distribution is normal. These formulas may be found in Kelley’s 
Statistical Method, pages 223-228. 

2 Correlation of Mental and Physical Tests. Psychological Review Monographs, 
Vol. III, 1901. 
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.19 between class standing and logical memory; .16 between class 
standing and auditory memory. Bennett! reports a correlation of .51 
between immediate visual memory for nonsense syllables and immedi- 
ate auditory memory for the same material; and an r of .62 between 
immediate auditory and visual memory for digits. The correlations 
for ‘‘mediate’’ memory (7.e., the number of readings necessary for 
learning, or the time) were lower, that for nonsense syllables auditory 
and visual being .03, and for digits auditory and visual being .08. 
Kitson? obtained r’s of .09 between memory for numbers heard and 
objects seen;—.09 between memory for numbers heard and logical 
material heard; and .26 between logical material heard and the same 
material seen. Bell* reports a correlation of .13 between class standing 
(average) and a simple learning test. This author quotes K. T. 
Waugh’s results‘ in which correlations of .24 were obtained between 
class standing and substitution learning, and .40 between class standing 
and logical memory, the subjects all being college students. King 
and Homan! report a correlation of .32 between logical memory (for a 
prose passage) and school grades, the subjects being one hundred ten 
college juniors and seniors. Johnson® found the correlation between 
intelligence as measured by a group of standard tests and learning as 
measured by a rather complex test’ to be .34. Carothers’? found an 
average correlation between digit-span (aud.) and seventeen other 
tests of association, memory for words, cancellation, etc., of .17; a 
correlation of .21 between logical memory and the same tests; and a 
correlation of .21 between substitution learning and the same group of 
tests. ‘Recently Hazlett® has reported a fairly definite relation 
between general intelligence and memory-learning (as measured by 
four tests) in a group of two hundred ninety-eight college women. 





1 Correlation between Different Memories. Journal of Experimental Psychology, 
Vol. I, 1916, p. 404. 

2 Scientific Study of the College Student. Psychological Review Monographs, 
Vol. XXIII No. 1, 1917. . 

’ Mental Tests and College Freshmen. Journal of Educational Psychology, 
Vol. VII, 1916, p. 381. 

4‘ A New Mental Diagnosis of College Students.”” N. Y. Times, Jan., 1916. 

5 Logical Memory and School Grades. Journal of Educational Psychology, 
Vol. IX, 1918, p. 262. 

6 A Study of the Relation between Ability to Learn and Intelligence as Meas- 
ured by Tests. Journal of Educational Psychology, Vol. XIV, 1923, p. 540. 

7 Psychological Examinations of College Students. Archives of Psychiology, 
Vol. XLVI, 1921, p. 58. 

8“ Ability.” 1926, pp. 138ff. 
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An individual having a poor memory (in the lowest twenty-two per 
cent by the tests) was found to be five times more likely to have 
inferior intelligence than an individual having a good memory (in the 
top twenty-two per cent). No correlations were computed. 

The Relation of General Intelligence to a Battery of Memory-learning 
Tests.—It is clear that the correlations of the Thorndike Test and our 
separate memory and learning tests are not significant as they stand. 
Will the correlation be substantially increased if the memory-learning 
tests are taken together in a team or battery? To answer this ques- 
tion, a multiple R was calculated between the Thorndike Test as the 
dependent variable and the eight memory-learning tests as independent 
variables. This R 123456789) = .53, which is about as high as the aver- 
age correlation obtained at Columbia between the Thorndike Test and 
college grades. The reliability coefficient of the Thorndike Examina- 
tion is .85; that of the memory-learning tests approximately .93.! 
Hence we may estimate the “true” relationship between memory- 


learning and general intelligence (using the well-known attenuation 
formula of Spearman) to be 
53 


- = 60 
rN 85 X93 


This coefficient represents the relationship which we should expect 
to find between very many measures of memory-learning (of which 
our tests are representative) and very many measures of general 
intelligence; or between precise measures of the two traits. The 
size of the correlation indicates quite clearly that measures of memory 
and learning (even those of a relatively simple sort) are substantially 
related to reliable measures of intellect, provided enough different 
measures, or tests of enough different phases of the ‘“‘function”’ 
can be pooled together. Individually the tests are of little value as 
predictors of general level; but low intercorrelations cause them, when 
pooled, to measure a considerable share of the criterion without much 
repetition or overlapping. 

If we exclude the three “‘learning tests, ”’ 7.e., digit-symbol, Turkish- 
English, and code, there are left five tests of immediate memory 
in which motor practice and the speed of connection-forming play a 
small part. To what extent are these tests, taken as a team, related 











1 This reliability coefficient was obtained by means of the Spearman formula 
for the correlation of sums and differences. British Journal of Psychology, Vol. V, 
1913, p. 417. The correlation was found between the sum of our eight memory- 
learning tests and the sum of eight identical tests. 
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to general intelligence? The multiple R with Thorndike examination 
is .44, which when corrected for attenuation becomes .52. This 
result indicates a fairly high relation between precise measures of 
immediate memory (both of the rote and meaningful variety) and 
general capacity. 

To the correlations above we may add the multiple R between 
general intelligence and sensory-motor learning of the substitution 
kind. With digit-symbol, Turkish-English, and code as our dependent 
variables, the R with Thorndike is .43, which when corrected for 
attenuation becomes .48. This represents our best estimate of the 
relation between accurate measures of substitution learning and 
general level; it differs very little from the correlation between Thorn- 
dike and the five memory tests. 

The multiple r’s which we have computed are of more than passing 
interest because of the highly selected character of the group. 
Memory is often thought of or implied to be a mechanical and routine 
function opposed to “‘thinking”’ and “‘reasoning”’ which are taken to 
constitute largely what we mean by intelligence. This is the view also, 
to a large extent, of sensory-motor connection forming. Probably we 
should expect to find a high correlation between general level and 
memory in average and low grade individuals because, presumably, 
their intellects are largely mechanical and routine in function. But to 
find a substantial relationship between general level and memory- 
learning (mostly of the routine sort) in those of superior intellect argues, 
it seems to the writer, for the essential one-ness of intellectual behavior; 
that memory-learning is fundamentally at the basis of even superior 
intellectual performance-behavior which we are wont to call ingenious 
or original. The r’s would.no doubt be higher but for the greater 
variability (larger potential number) of responses in the very superior. 
The view that the good intellect differs from the poor mainly in number 
of possible connections is Thorndike’s ‘“‘quantity hypothesis” of 
intellect. Thorndike holds that there is no basic difference between 
routine associative and connection-forming behavior and sagacious 
or original intellectual endeavor; both depend upon the same physio- 
logical connections, the latter, however, requiring many more than the 
former. The ‘“‘quantity hypothesis” will serve to explain the low 
correlations between separate memory-learning tests and the Thorn- 
dike examination. The latter test is an exceedingly thorough 





1“*The Measurement of Intelligence.”” 1927, 415ff. 
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sampling of an individual’s intellectual responses. It would seem to 
be obvious that no simple test of memory or sensory-motor learning 
could possibly embrace more than a limited number of these possible 
responses. Given enough tests of memory and sensory-motor learning, 
however, and these fairly unrelated inter se, and the combination 
should correlate highly with a measure of general level provided 
“general level” is not something fundamentally different. This 
seems to be what we have found. 

Attention should be called to the low, but consistent, negative 
correlations of age and our tests of memory-learning. If losses in 
memory-learning, however slight, appear regularly in an age range 
from sixteen to twenty-five or twenty-six years, it seems fairly prob- 
able that this loss would become substantial over a wider age extent. 
Logical memory is the only test which is not negatively related to 
age, the r being substantially zero. 

Summary.—We may summarize the foregoing discussion as 
follows: 

1. The correlations of eight memory-learning tests given to 
one hundred fifty-eight college men above the freshman grade were 
found to be low inter se and with general intelligence (Thorndike 
examination). 

2. When combined into a team, these eight tests correlate with 
Thorndike .53—corrected for attenuation .60. 

3. The multiple R’s between general intelligence and five memory 
tests and between general intelligence and three learning tests were .44 
(corrected for attenuation .52) and .43 (corrected for attenuation .48) 
respectively. 

The Question of a General Memory Factor.—The rather low inter- 
correlations of the memory-learning tests in Table I (the average is 
.20) suggest that we are dealing with relatively specific abilities 
instead of with a unitary trait which may be represented by the term 
“memory.” Several investigators, however, have reported a common 
memory factor within a set of tests.'_ For this reason, we decided 
to apply to our data as represented in Table II, the statistical criteria 
developed by Spearman, whereby the presence or absence of a “general 
factor’ may be established. Although any common underlying factor 
in our tests would seem to be a very small one, yet it seemed worth 
while locating it if present. 








1 Spearman, C.: “‘The Abilities of Man.” 1927, p. 287ff. 
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According to Spearman, the correlations which we find within a 
group of tests are usually explainable in terms of a general factor 
(called g) and specific factors (called s). The general factor runs 
through or saturates all of the tested capacities and accounts for all 
of the correlation except that due to chance errors, or to the over- 
lapping of specific factors which are not quite what their name implies. 
If the overlapping becomes considerable, 7.e., if we have correlation too 
high to be accounted for by g alone, then the presence of “‘group”’ or 
minor general factors is indicated. For example, the r between 
two cancellation tests or two opposites tests is greater than can be 
accounted for by g—obviously some other common elements (group 
factors) must be present. 

Spearman’s latest technique for determining whether or not a 
general factor is operative in a set of mental tests is called the method 
of ‘“‘tetrad-differences.” A tetrad may be formed from any four 
r’s in a correlation table which fall upon the corners of a rectangle. 


TABLE II.—INTERCORRELATIONS OF E1GHT MEMORY-LEARNING TESTS 








(1); (2) | @) | @®)| ®}] @O}] M | &® 
1. Digit-span (vis.)........... ee .03)} .00 .03} .15 | .20) .10 
2. Digit-span (aud.)........... wee | eee | m.-05) .07 |} —.12) .10 | .13 | .15 
3. Paired association (vis.)..... Sie ee we Bier .59 .26) .31 | .34 | .26 
4. Paired association (aud.)....| ... | ... |..... ee .19} .12 | .41 | .18 
5. Logical memory............ ion D> wake dbewss coef sue wat eel ae 
6. Digit-symbol.......  ceweue ne peree Regen tot Beak Pasere weet ae 
7. Turk.-English vocabulary...| ... | ... |..... ive cel ‘ele seta wae 
8. Code learning.............. rey Grey em ae ee mes 





























To illustrate, for the four tests, a, b, x, and y, the tetrad-difference 
would be: 


Taz X Toy vr Ton x Toz 


Each TD (tetrad-difference) in a correlation table will equal zero, 
within the limits of the sampling error, provided one general factor and 
specific factors only are present. When the TD is reliably greater 
than zero, then—and only then—is there evidence of group factors. 

Table II yields two hundred ten different TD’s each of which 
should theoretically be zero if only one general factor plus specific 
factors are present. (The maximum number of different TD’s in a 
correlation table can be determined from the formula 3 X ,C4, in 
which n equals the number of tests in the table.) Deviations from 
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zero Will always arise, however, due to sampling errors, which makes it 
necessary to compute the PE’s of the different TD’s in order to judge 
just how significant is a given deviation from zero. The calculation of 
two hundred ten PE’s is obviously a tedious process; hence to provide a 
quicker test of g and s, Spearman has devised a shorter method. This 
consists in comparing the ‘‘median deviation” from zero of the distri- 
bution of obtained TD’s with the theoretical Median deviation (or PE) 
around zero which might be expected to arise from sampling errors 


IS 




















12 @ 6 63.00 03 06 2 + + al 24 47 .30 .33 3 


Fig. 1—Showing the distribution of four hundred twenty  tetrad-differences 
compared with theoretical distribution to be expected from sampling errors alone. 


alone. If the obtained median deviation is less than the theoretical 
PE, then the TD deviations from zero may be explained as fluctuations 
due to sampling alone. On the other hand, if the obtained median 
deviation is greater than the theoretical PE, the existence of a group 
factor or factors over and above the general and specific is indicated.! 

From the frequency distribution of four hundred twenty TD’s? 
obtained from Table II, the histogram of Fig. 1 has been constructed; 
and over this is drawn in the theoretical normal curve representing the 
distribution of TD’s which might be expected to arise from sampling 
errors. The mean of the two hundred ten different TD’s without 





1 Spearman, C.: Op. cit., Chap. X and Appendix. 
2 There are four hundred twenty TD’s in all since each calculated TD is 
matched by a TD numerically the same but different in sign. 
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regard to sign is .0569 and the median is .0325. Both the mean and 
the median of the distribution in Fig. 1 must be zero since the plus and 
minus halves of the curve exactly balance each other. 

Spearman has devised several formulas for calculating the 
theoretical PE’s of sampling, viz., the PE around O as mean, when all 
fluctuations or deviations from zero are taken as due to chance. 
The earlier of these formulas! is 


1.349 ,-.——. 
PE _ 2 oF 2 
4/N VP : 


where s? = the mean squared deviation of all of the 7’s in the table 
from 7, the mean r; and p = 7(1—7). The later formula? is 


1.349 
PE = 2 l = R 2 
VN p? + ( )s 


in which s and p have the same meaning as in the formula above, and 


_ole ~-€) 2 fa-6 
R = 34) — 2n(%—8) 


n being the number of tests. 

The theoretical PE for the “chance TD’s”’ is .0231 by the first 
formula and 0.260 by the second. The mean value .0245 has been 
taken as the most probable. The obtained MD is .0325 (the median 
deviation from O) which is somewhat larger than the theoretical PE. 
This result indicates that not all, at least, of our TD’s can be explained 
as deviations from zero due to sampling alone; and that in accordance 
with Spearman’s theory there must be group factors present. Inspec- 
tion of the frequency distribution shows that there are twenty-seven 
different TD’s (54 plus and minus) beyond the extreme limits of 
5 X PE (i.e., + .1225). In addition to these, there are twelve TD’s 
in the step .0900—-1199 (see Fig. 1), a number six in excess (+3SD) 
of the maximum expected normal frequency for this interval. There 
are, then, thirty-three different TD’s which cannot be explained by 
one general factor plus specific factors, and which according to theory 
must contain a group factor or group factors. Upon examination of 














1 Spearman, C. and Holzinger, K.: The Sampling Error in the Theory of the 
Two Factors. British Journal of Psychology, Vol. XV, 1924, p. 17. 

2 Spearman, C.: Op. cit., Appendix XI, and formula 16a. 

’ Determined from Yule’s formula for the SE of the expected frequency within 


a given step-interval. See “Introduction to the Theory of Statistics.” 1919, p. 
309. 
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these too-large TD’s, it was found that one or another of the following 
combinations of tests always appeared in them: Digit-span (vis.) and 
(aud.); paired associates (vis.) and (aud.); digit-symbol and code. 
The correlations between these pairs of tests are larger than the other 
intercorrelations in the table, and consequently their TD’s are larger. 
The obvious explanation for these too-large TD’s is the closely similar 
materials in the test-pairs. Apparently this “similarity of content” 
constitutes the common bond which is boosting the r’s between the 
designated test-pairs. The other one hundred seventy-five TD’s are 
not significantly greater than zero, and accordingly the other inter-r’s 
may be explained by a single general plus specific factors. 

If our group factors are what we have surmised them to be, (7.e., 
similarity of content) we should be able to eliminate them by dropping 
out one member of each of the following test-pairs: Digit-span (vis.) 
and (aud.); paired associates (vis.) and (aud.); digit-symbol and 
code. From the different combinations obtained by selecting these 
tests, eight correlation tables can be drawn up, each containing five 
tests (Turkish-English and logical memory are always included). 
While it would appear logically that only a single general factor plus 
specifics could be present in these tables (group factors having been 
eliminated) this has been tested and found to be true. The largest 
TD is .08 + .04. The average intercorrelation within these eight 
tables is .17, which figure might reasonably enough be taken to indicate 
the extent of the common general factor.' Unfortunately a difficulty 
arises in accepting this conclusion. Since Spearman’s general factor g 
is mathematically derived, and is not a definite function which can be 
experimentally isolated, it is impossible to say whether the general 
factor within our tests is a genuine memory factor or simply ‘general 
ability,” z.e., g. Two possibilities, then, seem to present themselves: 
(1) Our general factor is a small but genuine memory factor; (2) it is 
simply g, the memory factors being specific. Since our tests have all 
been designed to measure memory and learning, the first alternative 
would seem plausible; but the positive correlations of our tests with 
general level (Thorndike) suggest that (2) might also apply. 

We have tried to settle the question by “‘ partialling out’’ Thorndike 
score—which is taken to be a fair measure of general ability—from 
each of the r’s in our eight tables. Presumably, the average r of .17 
should be little, or not at all, reduced if (1) is true; it should be 0 or 


1 These tables have not been published as they can easily be derived from 
Table ITI. 
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close to it, if (2) is true. Actually the residual r’s obtained from the 
eight tables average .11, which suggests the existence of a genuine, if 
extremely small, general memory factor running through our tests 
apart from general ability or g. This conclusion, unfortunately, 
must be tentative, as the PE’s of our intercorrelations are all around 
.04 and .05. Moreover, as we do not know just how adequate a 
measure of Spearman’s g is the Thorndike Test, g may not have been 
completely eliminated. On the other hand, since the Thorndike 
examination admittedly involves memory and learning we have almost 
certainly ‘‘partialled out” too much, so that the r of .11 is really too 
small. 

Spearman and his students have found evidence of the existence 
of a memory factor. The most complete study of this particular 
problem to come from Spearman’s laboratory is that of N. Carey.’ 
Carey set out to discover among other things whether there is a 
common memory factor among widely differing contents. Memory 
for language—verbal memory—was compared with memory for 
sensory data (both visual and auditory). The first was represented 
by sentences, unconnected words, and words associated with numbers, 
etc.; the second by colors, angles, patterns, musical pitch, metronome 
beats, etc. Correlations calculated between these measures were 
freed from the influence of g by means of special formulas devised by 
Spearman. The resulting “specific” r’s should all theoretically be 
zero, provided the correlations are due tog and salone. Asa matter of 
fact, the r between pooled tests of verbal memory and auditory, 7.e., 
sensory, memory was .19 + .05 (with g out); between pooled tests of 
verbal memory and visual memory .13 + .05. These r’s, according to 
Carey, point to the presence of a ‘‘very small” common memory 
factor over and above g. 

Even with the qualification, Carey’s conclusion (like ours above) 
is open to considerable doubt. The PE’s of the ‘‘specific”’ r’s between 
verbal and auditory memory as well as those between verbal and visual 
memory are too high to render these values trustworthy. Further- 
more, the self-correlations of Carey’s tests were, in many cases, 
low, several being in the .30’s and .40’s. The size of the sample which 
seems to have varied around one hundred fifty would seem to have 
been adequate. 





1 Factors in the Mental Processes of School Children, II. British Journal of 
Psychology, Vol. VIII, 1915-17, pp. 8, 70. 
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SUMMARY 


1. The low intercorrelations of our memory-learning tests (around 
.21) suggested that a common memory factor if present must be small. 
On eliminating one member of each of those test pairs which introduced 
group factors (presumably similarity of material or content) it was 
found that the common memory iactor is represented by an average 
intercorrelation which is almost certainly as large as .11 and may 
be as large as .17. 

2. The tentative conclusion is drawn that there is a small memory 
factor present within our tests. 
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THE SAMPLING THEORY AS A VARIANT OF THE TWO 
FACTOR THEORY 


JOHN MACKIE 


Department of Education, Edinburgh University 


The Sampling Theory of Ability as propounded and developed 
by Professor Godfrey Thomson has been objected to on three main 
grounds. 

In the first place, the probability that mental factors would be 
arranged among a set of abilities so as to make the resultant correla- 
tions conform with the laws which seem actually to govern these 
correlations is, it is said, so slight as to render the theory unworthy of 
acceptance. Secondly, even granting that the theory is mathemati- 
cally possible, it is argued that by means of certain transformations 
it can be shown to be a variant of the Theory of Two Factors, that, in 
fact, the mathematical expressions for the two theories can be made to 
agree. And, thirdly, it is held that the consequences of such a theory 
leads to conclusions which are psychologically indefensible and contrary 
to the facts of experience. 

It is the second of these that appears the most serious, and it is 
with it we propose here to deal. Were it the case, indeed, that the 
two theories could be made to agree in their mathematical expressions, 
and that these expressions could be reasonably interpreted in terms of 
each other, then the theories would really differ not in their basic 
ideas but almost solely in their verbal statement. That it is an 
important question is recognised by Professor Spearman, who includes 
the Sampling Theory in his ‘‘Sub-theories of the Two Factors;’”! 
in a later article he refers to Garnett’s having ‘plainly set forth 
the general factor in Thomson’s own scores;’’? while in his recent 
book? he says: “‘ As Garnett proceeded to show, the V (7.e., the measure 
of a given quality) could equally well be divided instead as follows: 


V=g+, 


. . It appears, then, that eachof Thomson’sv’s had really introduced 
a little bit out of the g together with a little bit of the s,,” etc. Clearly 





1 Manifold Sub-theories of ‘‘The Two Factors.’’ Psychological Review, Vol. 
XXVII, 1920. 


2 British Journal of Psychology, Vol. XVII, 1927, p. 324. 
3 “The Abilities of Man.’’ Appendix, pp. vi, vii. 
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the reconciliation of the Sampling Theory with the Theory of Two 
Factors is regarded, in our opinion rightly so, as of cardinal importance. 

In considering this question it is important to realise that there is a 
difference in the methods of approaching the problem in these two 
theories, both of which methods are scientifically sound. We shall 
first examine the Two Factor Theory as set forth by Garnett.! 

The assumption is that any mental quality is due to the operation 
of certain variable elementary factors, which for generality we must 
assume to be at least as numerous as the qualities under consideration. 
It is then shown that, if we take nm mutually perpendicular axes in 
n-dimensional space to represent the elements, the n qualities may (if 
we have chosen suitable units in our measurements) be represented by 
directed lines, any one of which we may picture as leaning near to those 
axes upon which it depends most and farther from those of which it is 
more independent. The correlation between two qualities is then 
shown to be measured by the cosine of the angle between the two 
representative lines. 

Now, we may assume that qualities are due to the operation of 
variable factors, and yet set out from the standpoint that we know 
nothing else about these factors, but shall take whatever factors seem 
to express our facts most simply. Pursuing the geometrical method, 
what we have to begin with is a set of lines in n-dimensional space 
making certain known angles with each other; we have no axes of 
coordinates, but are going to fit on any set which seems most suitable. 
This is analogous to setting out to discuss a plane figure by means of 
coordinate geometry and drawing the axes XOY in whatever position 
we like. 

Garnett proceeds to show that if the set of correlations satisfy the 
tetrad equation there is a certain constant less than unity associated 
wi’ h each line which may be taken as the cosine of an angle, and that 
there is a uniquely determined line in (n + 1)-dimensional space which 
makes these angles with the existing lines. Taking this uniquely 
determined line as one axis Og, we can draw n other axes, each per- 
pendicular to it and to each other. Lastly, keeping Og fixed, the 
other axes are whirled round, so to speak, to a certain position, in 
which it is found that the plane gOz, contains one of the lines we began 
with, gOz2 a second, gOz; a third, and soon. It is clear that whatever 





10On Certain Independent Factors in Mental Measurements. Proceedings of 
the Royal Society, Vol. XCVI, A, pp. 91 et seq. We have taken the liberty of 
paraphrasing the attractive analysis contained in that paper. 
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entities be represented by Og, Ox, Ore, . . ., Odn, each quality 
depends upon g and one of the z’s and upon nothing else. Hence, if 
we start by assuming ignorance of the nature of the underlying elements, 
we may, if we please, define whatever is represented by g and the z’s 
as elementary factors, of which g is general and all the rest specific. 
The task of the psychologist as opposed to the mathematician would 
now be to investigate whether this theory is a likely one, and what 
psychological meanings can be given to the factors thus mathemati- 
cally defined; and this is indeed the next step carried out by Professor 
Spearman in his book.! It is to be observed however, and this has 
always been recognised, that the entities represented by these (g and 
z’s) may not actually exist; and on the other hand only fair to admit 
that, in this respect, the theory is in no worse case than many accepted 
theories in the domain of physical science. 

Turning now to the Sampling Theory, we note that it begins with a 
hypothesis which, a priori at least, is admitted to be psychologically 
plausible. There are assumed to be a number of elementary factors, 
each of which acts either with its full force or not at all, and the 
various mental qualities are assumed to be samples of those elements. 
Can we form a geometrical picture of these so as to compare it with 
the other theory? We can, by regarding a set of N axes as represent- 
ing the N elements. If a certain quality Q, depends on s of the 
elements, then the s corresponding axes will define an s-dimensional 
space, in which we may imagine a line representing the quality Q,. 
This line will be equally inclined to all the s axes; for, since each of the s 
elements acts with its full force, the quality Q, will depend on the s 
elements equally. Thus we have a set of n lines lying in a space of N 
dimensions, the cosine of the angle between any pair being the correla- 
tion between the two qualities represented. Now these correlations 
may, under certain conditions, satisfy exactly the tetrad equation. It 
would seem correct therefore to think, and Garnett proves? that we 
can, in this event, replace our N axes by a different set of n + 1 axes 
where, as before, one of these along with each of the others in turn 
delimit a plane containing one of the representative lines. On this 
ground the Sampling Theory is held to be but a variant of the Two 
Factor Theory. We note in passing that the validity of the Sampling 





1“The Abilities of Man.” Chap. VII. 

2The Single General Factor in Dissimilar Mental Measurements. British 
Journal of Psychology, Vol. X, 1920, pp. 242 et seq. What follows in the present 
article refers to the transformation effected by Garnett in the place mentioned. 
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Theory does not depend on its exact fulfilment of the tetrad equation 
criterion. But if we admit that a tendency to fulfil that criterion 
indicates a tendency for the qualities to be due to a general factor and 
specific factors the discussion thereupon involves the whole theory. 

Now there is no question that if the tetrad equation is satisfied 
exactly, the qualities can be regarded as due to a single general factor 
and specific factors. From one point of view, this is a geometri- 
cal theorem in n+ 1 dimensions. And, if any other theory were 
advanced upon which the tetrad equation were exactly satisfied, there 
is no doubt that it could be forced into apparent agreement with the 
two factor theory. But having started from a hypothesis about our 
original axes we are entitled, before abandoning them, to examine the 
relation between them and the new set proposed to replace them. 
Garnett shows that when the factors are so distributed among the 
qualities that the tetrad differences are zero. 

1 
gq VN (22). 

He offers the following psychological interpretation of this equation in 
terms of the hypothesis underlying the Sampling Theory: ‘‘The whole 
number of a subject’s neurones rendered active by an effort of his will 
would then be proportional to his g.”! He does not, however, seek to 
interpret the resultant ‘‘specific factors.” It is this interpretation we 


_ proceed to find, and, as we shall see, the interpretation is a very 
unsatisfactory one. 


Geometrically the equation g = R24) means that the axis Og 


is drawn equally inclined to all the original axes. This means that g 


is a measure of some activity calling for the action of every elementary 


factor possessed by the mind. The factor ie does not signify that 


VN 
each element enters with only YF of its force; it is merely a factor 


which ensures that the measure of g will have the same standard 
deviation as the other measures, merely a factor defining units. 
The general factor g is thus the whole mind, not only those elements 
capable of entering into two or more activities. How then can we 
have other factors independent of such a factor as this? 





1 Loc. cit., p. 257. 



































618 The Journal of Educational Psychology 


In the figure OX, OXs, . , OXy are the positive directions of 
the axes representing (in N-dimensional space) the elementary factors; 
Og a line equally inclined to all of them. OQ, is a line equally inclined 
to those axes concerned in Q,; OR, a line equally inclined to all the rest. 
Now since OQ, and OR, are independent, they are perpendicular to 
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tetrad equation is satisfied or not; and Garnett’s analysis shows that, 
if the qualities do satisfy that equation exactly, the ¢-axes are perpen- 
dicular to each other also. Following the diagram we are able to 
interpret the factor ¢,. It is something which is helped by the factors 
making up Q, and hindered by the factors which do not; while g is 
helped by both sets. It would be possible, say by increasing the values 
of both sets by properly chosen amounts, to increase g and leave £, 
unaltered, or by increasing the value of one set and diminishing that 
of the other to leave g unaltered while altering ¢,, so that g and £ are 
independent. We are led then to describe the dependence of Q, upon 
g and &, in the following language: Q, is a quality which depends: 
(1) partly upon the action of the whole set of mental factors possessed 
by the brain; and (2) partly upon the action of some composite power 
which in turn is aided by a certain set of those factors and hindered by 
all the others; the degrees to which these two contributions take place 
being such that the action of the certain set mentioned is made com- 
plete, while that of the others is completely nullified. This sounds 
somewhat absurd, for we are only saying again that Q, depends on the 
action of a certain set of factors and not on the others at all. This 
interpretation of the ‘‘specific”’ factors ¢,, etc., differs from that 
usually implied. If we ask “‘What, in the Theory of Two Factors, 
is that which enables a boy to do Latin?” it is reasonable to be told: 
“Partly a general factor and partly a factor specifically devoted to 
Latin.”’ If we start with the Sampling Theory, and allow the trans- 
formation of Garnett’s (which is mathematically possible under the 
limiting conditions we are considering) to be made, and are then asked 
the same question, we must answer: ‘‘Partly his whole brain, and 
partly that part of his brain which is active in Latin hindered by the 
rest of his brain.”’” The “specific” factor is thus partly due to those 
elements active in Latin hindered by all the rest. 

The only way out of this absurdity is either to abandon the Sam- 
pling Theory altogether, 7.e., toscrap our first set of axes in favour of the 
new ones, or to say that, while the transformation is of course possible 
(for what a chance we have of drawing axes to our pleasure in n 
dimensions!), the interpretation of the new variables in terms of the 
original ones is such that they cannot be accepted as representing 
any entities of which we can satisfactorily conceive. As we have 
stated above, it is scientifically sound to assume a hypothesis regarding 
the entities causing any set of phenomena; and if a mathematical 
transformation enables us to postulate a new set of entities which 
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are not satisfactorily expressible in terms of our hypothesis we are 
quite justified if we decline to make that transformation and adhere 
to our original hypothesis. In the familiar equation in physics, pv = 
RT, it would be easy to replace the axes of p and v by two others in 
the same plane, and obtain a new equation 7’ = ?(q, w), where g would 
be something dependent on 7p and v, and w dependent on p and hindered 
by v. As such a transformation would be meaningless, we should not 
make it. | 

We may venture to make our argument clearer by an analogy. 
Suppose that some archeologists find the ruins of three temples, one 
of which had been made of brick and stone, another of brick and 
wood, and the third of stone and wood. Some of the archzologists 
might say that in the building of each temple we see the operation of 
two factors, one of which is common to all; this might be the religious 
urge of the nation which built them. The factors peculiar to each of 
the temples might be respectively: (1) Absence of trees in the locality, 
(2) want of stone, and (3) abundance of stone and wood. Others of 
the archeologists say that the nation in question possessed skilled 
builders and that the three temples are samples of their work, the first 
being due to brickmakers and masons, the second to brickmakers and 
carpenters, and the third to masons and carpenters. The first 
school of archzologists then claim that the second theory is but 
a variant of theirs, for, say they, we can make a mathematical 
transformation by means of which the brickmakers, masons and 
carpenters are replaced by four influences, one affecting all three 
temples and the other three one each. If it is inquired what those 
influences are, the reply must be: The general influence is the work of 
the whole set of builders; the specific influence in the first temple is the 
work of the brickmakers and masons hindered by the carpenters, and 
similarly for the others.1_ A sort of mathematical equation would be: 


Brick-and-stone Temple = (Work of brickmakers, masons and 
carpenters working with part of their might) + (Work of 

brickmakers and masons working with the rest of 

their might — Work of carpenters already done). 





1 By taking three axes to represent the work of the three sets of builders and 
supposing their respective shares to be equivalent, we could actually carry out this 
transformation. The “correlation”? between any two of the temples would be 
one-half; the new g axis would be equally inclined to the original axes. With only 
three things under consideration, as Garnett points out, we can always get a g axis, 
without any conditions. 
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It would then be open to the second set of archeologists to point out 
that this was just what they said at first, except that the carpenters 
had been brought in to build and then knock down what they had built; 
and that they might justifiably prefer to think that the carpenters 
were never there at all. As long as they chose to keep to their own 
hypothesis about the groups of builders it would be true to say that 
their theory could not be regarded as variant of the other one. 

We conclude then that it is only in the most formal mathematical 
sense that the Sampling Theory can be brought in under the Two 
Factor Theory; that if we adhere to the hypothesis underlying the 
Sampling Theory the interpretations we are compelled to put upon 
the specific factors obtained by the mathematical transformation is 
such as to show that these factors are mere mathematical fictions. 
Per contra, if, having arrived at the Two Factor Theory, we make the 
transformation from it, then the elements of the Sampling Theory 
expressed by means of the transformation are mere mathematical 
fictions. If either theory should be abandoned, it is not because they 
are equivalent. 
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A STUDY OF THE VALIDITY OF THE DOWNEY WILL- 
TEMPERAMENT TEST 


ROBERT S. THOMPSON 


University of Denver 


This investigation attempts to throw light upon the validity of 
the Downey Will-temperament Group Test from the standpoint of 
its differential value in the prediction of success and failure in the 
activity known as practice teaching. The subjects were the members 
of the two classes in practice teaching in the University of Denver 


‘ Training School for the first and second semesters of the school year 


of 1926-27. 

The work in practice teaching may be characterized briefly as 
follows: Students first observe teaching by the members of the staff 
and then are assigned to a room and grade where they teach usually 
from three to five hours a week under the direct supervision of a critic 
teacher. The schools used are the University Park Elementary School, 
the Grant Junior High School and the South High School, all units of 
the Denver Public School System. The work is under the direction of 
a professor of education who is also principal of the University Park 
School and has charge of the training in the other schools. Part of 
the routine of the courses consists in periodical criticism and rating of 
the student teachers by the critic teacher and by the principal after 
conference. Each student is given a rating, arrived at by having him 
rate himself on a schedule of traits, and through the rating of both the 
critic teacher and the principal. Thus systematic rating is part of 
the work. 

The group was given the Terman Group Test of Mental Ability, 
Form B, the average grade in all subjects for the entire college career 
of the individual was calculated, and each subject filled out a question- 
naire designed to gather information as to the hours of study, campus 
activities, number of credit hours carried and the amount of work for 
self-support. They were then given the Downey Will-temperament 
Group Test by the writer. This procedure gave measures of intelli- 
gence, scholarship, hours spent in study for practice teaching and the 
trait scores on the Downey tests. 

Out of eighty-eight records only seventy-four were complete. 
Several did not take one or the other of the tests, some had quite 
evidently misunderstood some of the instructions of the Downey tests 

622 
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and several were left-handed. Of this group of seventy-four, ten were 
men and sixty-four women. All were,college juniors or seniors. 

It was thought that the activity of practice teaching offered peculiar 
advantages for trying out the test because such factors as the control 
of children in a new situation, the administration of classroom routine, 
the planning of the work, etc., involve executive abilities which must be 
dependent in a large measure on the “will-temperament.’”’ The 
situation is quite different from that of the ordinary classroom. 

Many of the attempts to validate the Downey test have used the 
procedure of correlating the test scores with ratings or estimates of 
the amount of the trait by judges. That of Meier! is a good example. 
But as May? says: “The results of all these attempts at validation 
by the rating method are uniformly ambiguous. Nearly all correla- 
tions are low. Does this mean that the tests are not valid, or the 
ratings unreliable, or that this is not proper method of testing the 
tests?” In addition to the fact that the estimates themselves may 
be off, there is the consideration that the traits as named and 
described by Miss Downey are so subtle and elusive when verbally 
expressed that the judges have no clear idea in mind as to what they 
are rating. 

In this study no effort was made to have the subjects rated by the 
critic teachers on the traits purported to be tested. In such a rating 
the teachers would probably possess no superiority over another sort 
of judge, whereas by training and experience they were expert in 
rating the subjects as to success in practice teaching. 

For the purposes of the investigation, it was roughly assumed that 
important factors in the success of the student-teacher must include 
(1) intelligence (2) scholarship (3) hours of study spent in preparations 
and (4) the various traits measured by the Downey test. The aim was 
to correlate measures of these various factors with the practice teaching 
rating, and then apply the technique of partial and multiple correlation 
to find the relative significance and combination effects of the factors. 

In Table I are given the simple correlations of the measures with 
the practice teaching rating. 

An inspection of the table reveals how disappointingly low the 
correlations are, those of intelligence and scholarship being the only 





1Meier, N. C.: A Study of the Downey Test by the Method of Estimates. 
Journal of Educational Psychology, Vol. XIV, 1923, pp. 385-395. 

2 May, Mark A.: Present Status of the Downey Will-temperament Test. 
Journal of Applied Psychology, Vol. IX, 1925, pp. 29-52. 
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ones which are more than four times the probable error. Hours of 
study seem for this group generally to be unrelated to practice— 
teaching proficiency. Of the Downey trait measures only that of speed 
of decision approximates four times its probable error although self- 
confidence, speed of movement and flexibility seem to have some slight 
relationship. 


Tas.E I 
Practice TEACHING RATING AND r PE 

aids sock pa ghe see eeccewe ss + .38 + .07 
Average college grades............02ceseeeeeees + .37 + .07 
ess cas ode edededewnseh eed — .02 + .08 
rs ae ib ced ene seduveeseeete — .27 + .07 
ES Se Oe ee eee + .20 + .07 
IIL. g.anin. ose od n0 be Uses pecce sie +.19 + .07 
ESSE ae ee +.18 + .07 
Ns 24 1S os dee are a WIG ee She eees +.14 + .07 
en ano 0's bo wee ESauee eee ones + .09 + .08 
Coordination of impulses...........0.cccccceees + .07 + .08 
SST ee er + .07 + .08 
es ead cede sen hea e On bee 'ad + .02 + .08 
Se ee ee eee —.17 + .07 
ccc adis cheaseedeeeee enews —.17 + .07 
Volitional perseveration................0eeeeee- —.19 + .07 
Sum of scores (speed of decision, speed of movement, 

self-confidence and volitional perseveration)..... + .33 + .08 


Taking the sum of the scores on the four traits of the Downey test 
showing the highest relationship (speed of decision, speed of movement, 
self-confidence and flexibility), we get a coefficient of .33. However, 
there is some doubt as to whether or not this adding of scores is 
proper. Miss Downey has been quick to point out that some traits 
may bear an inverse relationship to each other. Inasummation proc- 
ess there might be a tendency to cancellation or undue inflation. 
Some basis may be also be found in this to urge that high correlations 
with the separate tests should not be expected. The will-temperament 
may be composed of so many complexly related traits that it will only 
give up its secrets when the relationships are measured rather than 
the magnitudes of separate traits. While in an intelligence test each 
item must prove its validity separately, such may not be the case in a 
temperament test. 

Probably the most startling of the coefficients is that of volitional 
perseveration (—.19). Certainly most judges would say this must 
be a highly valued quality in teaching. It is easier to understand why 
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the other traits with negative coefficients (non-compliance and finality 


of judgment) might hinder an adjustment during the apprenticeship 
stage. 
































Tasie II 
Partial correlations Multiple correlations 

(1) Practice teaching............. T12.3 -23 | Rigas) .43 PE}; .06 

cece easde senses T13.2 .21 

cca bbb e seen dees 23.1 .46 

(1) Practice teaching............. T12.34 -20 | Rigassy) | .49 PE} .06 

So oii aie bndea deeses T 13.24 .23 

(3) Scholarship nee 6s 60.06 046 BEee44 714.23 . 26 

(4) Speed of decision............. 

(1) Practice teaching............. T 12.34 .20 | Rigass) | .46 PE} .06 

i PD, civhovsbscuccees 13.24 .23 

(3) Scholarship TeTeCTTTTTCCrre Tt 714.23 .19 

(4) Self-confidence............... | 

(1) Practice teaching............. rise | .25 | Ris | .45PE| .06 yi 

(2) Intelligence..........2ccecee. 113.24 .17 ‘i i 

(3) Scholarship TYVTTTELITrrrcrce 14.23 .15 if i 

Sh SE, 8, Cvkvecsneceens if f, 
rh. 

(1) Practice teaching Weweceoecas en 712.34 22 Ri(234) .44 PE .06 ; iy 

IE: ccc ceceeeeesnese 113.24 21 bi 

(3) Scholarship cto eters ane vn cus 114.23 .08 ee 

i cos cceteiweewos ees by 
ce 

(1) Practice teaching............. T1234 .22 | Riss) | .44PE| .06 pee 

i occccnnehseseees 13.24 .21 b 

(3) Scholarship Like ee debe 6.6 068 6 @ 714.23 .12 

(4) Motor inhibition............. es 

(1) Practice teaching eer eeT eT ee rT 712.345 .16 Ri(2345) .52 PE .06 f 

(2) Intelligence SS ee ee 714.235 .28 mt F 

i Ns nc heenkdcbesese 115.234 21 

(4) Speed of decision............. 

(5) Self-confidence............... 

(1) Practice teaching............. Ti2.3 28 | Rigas) .33 PE | .07 i 

(2) Speed of decision............. 113.2 21 | 

(3) Self-confidence............... ‘ 

















Nors.—The correlation between intelligence and college grades was .54. 
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Table II shows the results when partial and multiple correlation was 
applied to various combinations of scores. 

It will be observed that when intelligence score is correlated with 
practice teaching rating, and the scholarship measure is partialed out, 
that the coefficient is reduced from +.38 to +.23. Practically the 
same reduction takes place when the scholarship measure is compared 
with the practice teaching rating and intelligence partialed out. 
The multiple correlation of intelligence, scholarship, and practice 
teaching is +.43. It might be noted in passing that for this group the 
correlation of intelligence and practice teaching is +.54. 

A number of other partial and multiple correlations were worked 
in which practice teaching, intelligence, scholarship and several of 
the Downey trait scores were compared. (See Table II.) In the case 
of the partial correlation of the Downey trait scores with practice 
teaching rating, with intelligence and scholarship partialed out, the 
figures show that the correlations involving speed of decision, self- 
confidence, speed of movement and motor inhibition are not decreased 
in size when the two factors are held constant. Evidently these scores 
represent a trait which is independent from those standing in a causal 
relation to intelligence and scholarship. In the partial correlation 
involving flexibility, the coefficient is reduced from +.18 to +.08, 
indicating that this trait is measured in the intelligence and scholarship 
tests. Inthe partial correlation using practice teaching rating, speed of 
decision and self-confidence, the original correlations are not affected. 

The highest multiple correlation coefficient obtained was that 
which used the four measures of practice teaching, intelligence, scholar- 
ship and speed of decision, +.49. This compares with the correlation 
between intelligence and scholarship of +.54. A numberof other coef- 
ficients were obtained above +.40 by using other Downey trait meas- 
ures in conjunction with the other measures. When the two Downey 
trait scores showing the highest original correlations with practice 
teaching ratings were correlated jointly with this rating, the coefficient 
was +.33, which compares with the coefficients of +.38 and +.37 
between practice teaching rating, and intelligence, and scholarship 
respectively. 

The writer was disappointed that this investigation was started 
before the results of Miss Downey’s study of the reliability of the test 
was published.! The tests were scored according to the norms pub- 





1Downey, June E.: Reliability of Group Will-temperament Tesi. The 
Journal of Educational Psychology, Vol. XVIII, pp. 26-39. 
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lished in the first revision of Miss Downey’s “‘Manual.’’ Miss Downey 
suggests various methods of scoring in her recent study to increase the 
reliability, and the writer regrets that these suggestions were received 
too late to be of use. 

It would seem that if the reliability of some of the tests of the 
Downey group was increased that they might be used as a basis for 
developing a battery of tests of value in the selection of student- 
teachers. This appears to be true of the tests of speed of decision, 
self-confidence, speed of movement, and flexibility and motor inhibi- 
tion, and also negatively in regard to volitional perseveration, finality 
of judgment, and non-compliance. However, the main problem would 
seem to lie in the relationships rather than in the separate scores, and 
the method of Poffenberger and Carpenter. 


SUMMARY 


1. Of the twelve tests of the Downey group, only four show even a 
slight relationship with success in practice teaching as measured by 
rating of expert judges. These with their correlations with this 
rating are: Speed of decision, +.27; self-confidence, +.20; speed of 
movement, +.19; and flexibility, +.18. Three show a very slight 
negative relationship: Volitional perseveration, —.19; finality of deci- 
sion, —.17; and non-compliance, —.17. Of all the coefficients that of 
speed of decision is the only one that is approximately four times the 
probable error. 

2. The correlations of intelligence with practice teaching, and 
scholarship with practice teaching are respectively +.38 and +.37. 
Between intelligence and scholarship for this group it is +.54. 

3. Hours of study seem to bear no relation to success in practice 
teaching. The coefficient is —.02. 

4. In the partial correlations of practice teaching with intelligence, 
scholarship and several of the Downey traits, the coefficients represent- 
ing the relation between practice teaching and speed of decision, 
practice teaching and self-confidence, and practice teaching and speed 
of movement, and practice teaching and motor inhibition are not 
reduced, or only slightly reduced when the factors of intelligence and 
scholarship are partialed out. However, intelligence and scholarship 
are each considerably reduced by the effect of each other. The infer- 





1 Poffenberger, A. T. and Carpenter, F. L.: Character Traits in School Success. 
Journal of Experimental Psychology, Vol. VII, 1924, pp. 67-74. 
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ence is that the Downey tests referred to do measure something not 
included in intelligence or scholarship measures. 

5. When the two Downey trait scores showing the highest relation- 
ship with practice teaching rating are jointly correlated with it, the 
coefficient is +-.33, the same as when their sum is so correlated. When 
practice teaching rating is jointly correlated with intelligence, scholar- 
ship, speed of decision and self-confidence, the cofficient is +.52, 
which compares with the correlations between intelligence and scholar- 
ship of .54. Other multiple correlations, using the combinations of 
practice teaching rating, intelligence scores and scholarship with 


several of the Downey traits separately give coefficients from +.44 to 
+.49. 








SEX DIFFERENCES ON ABILITY TESTS IN ART 
ALFRED S. LEWERENZ 


Statistician, Department of Psychology and Educational Research, Los Angeles 
City Schools 


Visual art is a field of instruction in which there has been little 
attempt made to apply objective methods of measurement. It is con- 
tended that art contains many elements which depend upon the 
authority of competent judges for their standing. Authority and 
experience, rather than demonstrable evidence, largely determine 
standards. Where such a condition prevails, it is pointed out, objec- 
tive measurement of values upon which there is no absolute agreement 
must necessarily proceed slowly. 

Examining the situation we find that instead of one there are at 
least two methods of applying our measuring devices. Dr. Thorndike 
utilized one plan in his scale for rating the drawing of children.! A 
series of drawings depicting houses, landscapes, and faces, ranging 
from poor to good in execution, have been rated by judges. The 
samples are ranked on a large printed sheet much on the order of a 
handwriting scale. Dr. Thorndike measured the product or achieve- 
ment of the child as indicated by his finished drawing. The difficulty 
with such a scale is that it has a tendency to limit the subjects which a 
child may draw. For the accurate evaluation of drawings they should 
be all of the same type as those on the rating sheet. This method at 
measuring pupil drawing ability has not, therefore, found favor with 
teachers of art. 

A second approach in securing a measure is to construct a series of 
tests which will evaluate certain abilities that condition success in art. 
These abilities will be in large part natural but somewhat subject to 
training. 

The Los Angeles Tests in Fundamental Abilities of Visual Art 
were constructed according to the latter plan.2 They are designed 
to measure abilities rather than the product of abilities. Nine tests 
are employed to measure seven abilities as follows: 





1 Thorndike, E. L.: The Measurement of Achievement in Drawing. Teachers 
College Record, Nov., 1913. 

2 Lewerenz, Alfred S.: “‘Scientific Measurement in the Realm of Art.” 1927 
Yearbook of the Southwestern Educational Research and Guidance Association, 
Research Service Co., Los Angeles, pp. 59. 
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Part I 
1. Recognition or Proportion. 
2. Originality of Line Drawing. 
Part II 


3. Observation of Light and Shade. 
4. Knowledge of Subject-matter Vocabulary. 
5. Visual Memory of Proportion. 


Part III 


6. Analysis of Problems in Cylindrical Perspective. 
7. Analysis of Problems in Parallel Perspective. 

8. Analysis of Problems in Angular Perspective. 

9. Recognition of Color. 


In securing norms the tests were given to approximately one- 
thousand unselected students ranging from Grade III through the 
senior year in high school. In addition to the test data other informa- 
tion was secured for each pupil such as chronological age, mental age, 
intelligence quotient, semester grade in art (if any), nationality and sex. 

It will be the purpose of this paper to summarize the sex differences 
shown on the individual tests as revealed by the standardization data. 
The boys and girls were comparable, the median IQ for boys being 
100.6 and for the girls 100.2. The two distributions for IQ’s were 
very similar and closely approximated Terman’s distribution of 
nine hundred five unselected cases. On the average the boys were 
about two months older in mental and chronological age. 

In discussing the differences displayed by boys and girls on the 
tests there will be given: First, a brief description of the test; second, 
the number of cases involved in the median raw score and the inter- 
quartile range; third, observations based on graphs not reproduced 
here. 


Part I 


Test 1. Recognition of Proportion—The type of test chosen for 
the recognition of proportion is one of multiple choice with four 
response possibilities. There are fifteen sets of drawings with four 
pictures to a set. Each set is made up of four bowls, friezes, cornices, 
curves, still life compositions, etc., varying from bad to good in 
proportion and balance. The pupil marks with an X the picture in 
each group of four that he likes best and feels to be most pleasing. 





1 Terman, Lewis M.: “‘The Measurement of Intelligence.”” Houghton, Mifflin 
Co., Boston, 1916, pp. 66. 
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Boys GIRLs 
EERE Sp Da Rar, ie oe a 505 506 
EE PELE 4.9 4.7 
_—OERRRREESS ae Reon a tie oe a pelea eee ee FS 1.8 


Remarks.—The median girl was exceeded by fifty-three per cent of the boys in 
the ability to select shapes of good proportion. While the quartile deviations are 


about the same for both, the graph shows that the girls made the majority of the 
higher scores. 


Test 2. Originality of Line Drawing.—On the test page are arranged 
ten sets of dots ranging in number from three to eighteen. These 
dots the student should incorporate in his drawings. To those who 
are original the test proves a stimulation for new concepts but those 
who are not thus gifted find themselves unable to surmount the 
obviousness of the dots and produce dot to dot drawings. The purpose 
of the test is to discover the student who is able to solve a drawing 
problem in an original manner. 


Boys GIRLs 
I ae ee ee ee ee 539 525 
es ene whee gees 1.5 1.9 
RENEE PIE op Nc ey 0 a ee Dan ae oe 0.8 0.85 


Remarks.—In this test girls are superior, as 60.1 per cent of the girls exceed the 
median boy. The graph shows that while boys make many more poor drawings 
than do girls they are, however, equal to or slightly better than the girls in the 
matter of superior originality. The girls have a tendency to make a great many 
average drawings as indicated by a prominent mode. 


Part II 


Test 3. Observation of Light and Shade.—The test is a variation of 
the completion type adapted to the technique of drawing. Pupils are 
required to indicate omission of shades and shadows in a series of 
drawings ranging from simple to complex. The idea has been 
employed in several intelligence tests in the same capacity, namely— 
as a measure of observation. 


Boys GIRLs 
i re a ge a goa er ere ee 511 507 
ae ie ew 6.6 whee 0:0 14.0 12.0 
eS EE a et ee 3.3 3.7 


Remarks.—Boys seem to show greater ability in observation as 58.3 per cent of 
the boys exceed the median girl. The girls display greater dispersion and a tend- 
ency to be bi-modal. The graph indicates clean cut superiority for the boys with 
both curves showing smooth distributions. Girls seem to lack somewhat the 
ability to scrutinize closely compositions of increasing complexity. 
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Test 4. Knowledge of Subject-matter Vocabulary.—The vocabulary 
list that has been worked out has taken the form of a matching test. 
The form has six sections made up of ten pairs of wordsineach. The 


MEDIAN SCORES OF BOYS AND GIRLS COMPARED, TESTS 
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terms are those peculiar to the art room and are not of the general 
vocabulary. 


Bors GIRLs 
es ge ee a gee 434 437 
ee kes Ubi wie’ aeeee 12.3 11.7 
Sana hid nied ik oh 40 40 6 wah cease be dos 3.2 2.8 


Remarks.—The data seem to indicate that boys are slightly superior in their 
comprehension of technical vocabulary as 55.5 per cent of them exceed the girls. 
The graph indicates in this case a pronounced bi-modalism for the girls with but a 
trace for the boys. The response of the girls shows less dispersion than that of the 
boys. Judging from the graph the test has high selective power. 


Test 5. Visual Memory of Protection.—In this test, the pupil has 
an opportunity to observe for two minutes a black vase form mounted 
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on a white background. When the exposure time is up, the forms 
are removed and the children draw the vase from memory. As an 
aid to the pupil in drawing and to the examiner in scoring, the test 
sheet has printed on it the top and bottom of the vase with a vertical 


line through the center. The pupil draws but two lines, 7.e., the sides 
of the vase. 


Boys GIRLS 
EP Pee Sh ea eee 516 505 
ESE fo a 10.7 7.9 
ale SUN ok At atd del heed ah na deta LS > Wigh did bidity wr e-aa'a, 02 3.6 3.5 


Remarks.—It would seem that boys are superior in the ability to draw correctly 
from memory as 56.4 per cent of the boys exceed the median girl. On the graph 
there was little difference in dispersion, the two curves being very similar. The 
test appears very selective at the upper end. It is interesting that the boys mani- 
fested more capacity for memory drawing than girls and it is probably due to that 
attention for detail which is revealed also in the tests for observation and analysis. 


Part III 


Tests 6, 7 and 8. Analysis of Problems in Perspective.-—Tests have 
been devised for analysis based on problems in perspective. Solving 
the problems call for the ability to analyze critically compositions of 
some complexity. It is expressed in the student’s own work as a 
capacity for auto-criticism. 

Test 6 contains five pictures involving parallel or one point perspec- 
tive. The subjects are the familiar railroad track, a solid box, a 
transparent box, a room and a hallway. In each certain lines have 
been drawn out of perspective. To one with a true eye, these errors 
will stand forth. To one who has no logical sense of visual criticism 
the errors look natural and will be passed over. 


Boys GIRLs 
RT ee ee eee eee abe eeeeneeeene 430 437 
ETE Py ee rT Pe oe 7.9 7.3 
ic eckdhcte dade etdhebenadsinnsceeseeveecoese 1.6 1.6 


Remarks.—The data for this test favor the boys as can be seen by a comparison 
of the median scores. The median girl was exceeded by 59.8 per cent of the boys. 
The chart again shows the girls to be markedly bi-modal. It is interesting to note 
that the quartile deviations remain the same. 


Test 7 is made up of four pictures illustrating the principles of 
two point or angular perspective. The problems involve a single 
box, three boxes together, two books and a house. 
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Boys GIRLs 
aS ae eal i wie Gey 375 356 
ee ee ea whe ceees 8s 3.1 2.7 
(RIES SS al eG EAA ld aR a ee OE 1.1 1.5 


Remarks.—In this test of analytical ability 60.3 per cent of the boys did better 
than the median girl. The graph shows that one or two elements are too easy but 
that the remainder of the problems make it, on the whole, a difficult test for nearly 
everyone. 


Test 8 partially covers the field of cylindrical perspective. The 
pictures deal with cylinders, flower pots, and the ellipses on a die, a 
ball, and a lighthouse. 


Boys GIRLs 
EE AP bene ae 8 en 389 347 
ES ET ee De 3.0 2.6 
ee ee Oe OE Case tig ep sw aw hee ones oo 6% 1.2 1.5 


Remarks.—As the analysis grows more difficult the boys forged still more into 
the lead. The boys have 61.6 per cent of their number in advance of the median 
girl. The superiority of the boys is most pronounced in the higher scores. The 
girls show a greater dispersion. It appears that of the entire test series these 
analysis problems reveal the greatest contrast between boys and girls. This 
tendency is perhaps reflected in the dislike on the part of many women instructors 
for teaching perspective and the opposite attitude toward the subject displayed by 
most men teachers. 


Test9. Recognition of Color.—The test for color recognition might 
be described as being of the multiple-choice six response type with 
fifty questions, two of which are answered in the directions. A color 
chart is used in giving the examination. At the top of the chart are 
six known colors on standards: Red, orange, yellow, green, blue and 
violet. Below are given variations of these six standards with their 
intermediates together with their tints and shades, forty-eight 
unknowns in all divided into four sections. The child is asked to 
indicate on the form provided what he believes to be the predominant 
known color in each of the unknowns. 


Bors GrmLs 
el i a tg ee De eae kee 538 529 
I als cle aa o dew aed dwa coda ot 32.0 33 .0 
REET ARN edi ir ii saa ae nee ee eWadew deena 4.0 3.6 


Remarks.—As the above medians indicate, the girls, on the whole, show a 
physical capacity to recognize one more color out of forty-eight than do boys. 
This color superiority is also evidenced by the fact that 55.4 per cent of the girls 
exceeded the median boy. Girls show their greater ability in color recognition 
among those who have inferior color vision, for boys and girls with higi: color dis- 
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crimination are about equal in number. Government figures support the above 
data for it is estimated that, per thousand population, four women and forty men 
are color blind. 


The preceding data would seem to warrant the following 
conclusions: 


SUMMARY 


1. Girls appear to be superior in but two of the abilities measured, 
namely in originality and in color recognition. 

2. Girls manifest greater conservatism than do boys since they 
crowd the central tendency more closely. 

3. Boys, on the other hand, dare more and consequently do both 
better and worse than the girls. 

4. Boys have, in general, more ability to scrutinize and analyze 
than do girls while the latter partly compensate for this lack by having 
a greater sense of rhythm and color. 





1 Dane, J. M.: “‘The Problem of Color Vision.’”’” Annual Report of Smithsonian 
Institute, 1907, Part I. 
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A NOTE ON THE CORRELATION OF AVERAGES 
HELEN M. WALKER 


Teachers College, Columbia University 


The interpretation of the meaning of the correlation between 
averages is of importance for educational research. If the correlation 
between averages is necessarily different from the correlation between 
the original scores contributing to those averages, then the conclusions 
reached by a large number of statistical studies are probably erroneous 
and may well be reconsidered. If the correlation between averages 
should prove, under certain conditions, to be the same as the cor- 
relation between the traits themselves, that fact would be of the 
greatest utility in many problems such as certain of those dealing 
with the standard error of the difference of two means. 

On page 23 of the ‘“‘Twenty-seventh Yearbook of the National 
Society for the Study of Education, Part I,’ Miss Burks says: ‘‘It is 
not uncommon to encounter material which presents the correlations 
obtaining between average scores for a number of groups, each one of 
which contains a large number of individuals . . . The correlation 
coefficients become inflated when averages are employed because a 
great many factors that ordinarily keep the correlation between 
intelligence and education from being perfect cancel out (7.e., they are 
approximately the same, taking the averages of entire states). Con- 
sequently, these ‘inflated’ coefficients cannot be interpreted as ordinary 
correlation coefficients can be. In fact it is almost impossible in most 
cases to give any definite interpretation to them whatever.” 

It is the purpose of this paper to show mathemutically that for 
random samples (1) the correlation between averages is, in general, the 
same as the correlation between the original scores, (2) to discuss the 
case where the samples are biased, and (3) to present experimental 
data as a check on the theoretical conclusions. 

It is well known that such formulas as the Spearman-Brown 
Prophecy Formula, the index of reliability, and most of the theorems 
concerning ‘‘true scores” on a test grow out of the general theorem for 
the correlation between averages. Different assumptions applied 
during the course of the derivation produce different results, and the 
correlation of averages may, under different circumstances, be equal 
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averages is one thing when the scores composing each average are the 
scores of unrelated individuals drawn at random from a large popula- 
tion; it may be something else when the scores composing each average 
are the scores of individuals in a group possessing a certain homo- 
geneity and different from the groups producing the other averages 
as, for example, in classes sectioned according to ability; and it is 
something very different still when the scores are merely different 
measures of the same trait, as when scores on different forms of the 
same test are averaged to give a more reliable measure of a single 
individual. These situations are so different that it is not satisfactory 
to say that in general the correlation between averages is necessarily 
greater than the correlation between original scores. 

Pearson has proved! that the correlation between means is equal 
to the correlation between the traits for cases of simple sampling, 
deducing this as a particular case of a more general theorem. The 
same law is given by Kelley without proof on page 178 of his Statis- 
tical Method, where he writes it: ry,y, = fi2, The proof is not 
dificult and has probably been worked out by many people, but 
to show how the formula is affected by different assumptions we will 
present a derivation here. 

’ Out of a large population, suppose that m samples of n cases each 
have been drawn. 

Let X and Y be the means for all the mn cases, and X, and Y, 
be the means for a particular sample of n cases. 

Let X; represent any score, and z; = X; — X. 

Let X; represent any score other than X;. 

Now the mean of the scores is, in general, equal to the mean of the 
means when the latter are weighted in proportion to the number 





1“‘On the Probable Errors of Frequency Constants.” Biometrika,¥Vol.{ IX, 
1913, p. 3. He first proves that the correlation between any two |product 
moments around a fixed origin is given by the formula 

Pata’ usu’ — D' aq’ Pun’ 
V pi 1q-2¢ — (p' qq’)? V DP’ suru’ — (p' un’)? 
Since the two means are p’ io and p’1, this reduces to 
Ts! 00 o las ities P11 — Pp’ 10p'o1 
V p' 2 — (p'10)? V P's — (po)? 
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of cases contributing to them. In the present instance the same 
number of cases contribute to each of the means. 


Let oe oe 
AX, = X, — X. 
Then fis 
nmAX, =%1+4%2+2%3 t+ +++ +2, 
and es 
ees et tet Be + 7 ee + Yn- 
n?Ax,AY, = G + 22+ ++ > +2n)(yi + 2 t+ °° > +n) 
n(n—1) 
-> ty + > TY} 
Also 


n(n—1) 
nA?X, = (ri +tae+-++ +2,)? = 3 x? + > LY}. 
1 
The correlation between the means of the m samples is 


Dax", 
1 


gare J ax, J3 Ean 
Ei 


D2? + >, 2; p> Se + z YiVi 
1 1 

This expression may be simplified by the application of any one of 
various assumptions, the results differing with the choice of the 
assumption. 

(a) Assume that the samples are random samples. There is then no 
systematic correlation among the various scores constituting each 
sample, and it follows that Zay; = 0, L227; = 0, and Lyy; = 0 
except for chance fluctuations. The formula now reduces to 























m n(n— 1) 
> Sew . > LY; 
1 
> n(n—1) (1) 











| rzry (2) 


rz7, = = Toy. 
et  / S224/ Sy? 
It thus appears that for random samples the correlation between 
means is equal to the correlation between original scores within the 
limits of the fluctuation due to chance. 
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(b) Assume the samples to be slightly biased, the individuals composing 
one sample having a tendency to resemble each other somewhat more 
closely than they resemble the individuals in the other samples. This is 
probably the situation when the various samples are classes which 
have been sectioned according to ability, or when some of the samples 
are taken from rural and some from urban schools. Assume now that 
each individual in one sample is paired with every one of the other 

n(n—1) 
(n — 1) individuals in that sample, and that the values of >, titi 


1 
n(n—1) n(n—1) 


Dyvis and > xy; are computed for that sample. When these are 
1 1 


summed for all m samples, it is improbable that any one of the sum- 
mations will be zero. If the three summations were each equal to 
zero, then r,x,7, would be equal to r,,. If they are not all equal to 
zero, the correlation between the means is likely to be somewhat 
different from r,,, although the writer can see no reason to expect 
great divergence. 

Under these circumstances the existence and amount of the inflation 


depends upon the relationships existing between r-,, Dri Lari and 


Dui 

If within a particular sample the xz score of each individual is 
successively paired with the x score of each of the other (n — 1) 
individuals, we may compute a correlation from these n(n — 1) pairs. 
Suppose that such a correlation is found for each of the m samples, and 
let r, designate the mean of these correlations. In the same way, if 
the y score of each individual is paired with the y score of each of the 
(n — 1) other individuals in the same sample, a correlation r, may be 
obtained. Further, let the zx score of each individual be paired with 
the y score of each of the (n — 1) other individuals in the same sample, 
and let the average of the m correlations so found be called r,. Then 
from equation (1) we have 


ene oat... (3) 
Ye V1 + (n— Dre V1 4+ (n — Dry 


Tr 





From this it follows that rz7, is greater than, equal to, or less than 
rz, according to whether r, is greater than, equal to, or less than 


tey(—1 + V1 + (n = Dre + 7) + (m= Dray, 
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This relationship becomes quite simple if r, = r,. In that case, the 
correlation between the means is greater than, equal to, or less than 
the correlation between the traits according to whether r, is greater 
than, equal to, or less than 7,,r-. If r, = rz = r,, then we have the 
Spearman-Brown prophecy formula as in (c) to follow. By a priori 
reasoning, it seems to the writer highly improbable that r, should 
exceed r, or r,, and that therefore the value given by the Spearman- 
Brown formula is probably a fairly safe upper limit for the correlation 
between the means. In most cases where the sampling is only slightly 
biased, it seems unlikely that r, will be much larger than r,,r,. This 
is of course a matter to be determined by experimentation rather than 
by a priori reasoning. 

(c) Assume that X and Y represent any two measures of the same 
thing. Then r,, is a reliability coefficient, and rz, is the result 
obtained by correlating the mean of n measures of the variate with the 
mean of n other measures of the variate. Instead of the mean of the 
scores of n individuals, as in (a) and (b), we have now the mean of n 
measures of the same individual. The formula ought obviously to 
reduce to the Spearman-Brown Prophecy Formula. Assuming that 
all forms of the test have the same mean and the same standard 
deviation and that the correlation between any two forms is the same 
as the correlation between any other two forms, we have 


m 
Dnrs202° 
a 1 mr ir 


x7, “iF@-pn =| 
> (no.? + n(n — l)rs2o2 


1 








as expected. 

These three situations are so different that it is misleading to speak 
of them all as merely “correlations between averages.”’ In (a) and 
(b) we are dealing with averages obtained by adding the scores of 
different individuals, as we might find the average of a class. In 
(c) we are dealing with averages each of which is merely a more reliable 
score for a single individual. In the latter case the correlation of 
means will be necessarily higher than the correlation of individual 
scores, but not so in the other cases. 

The foregoing discussion assumes that there are a uniform number 
of cases in all the samples. In practice this is usually not so, but the 
means come from groups of varying size. When the samples are 
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chosen at random, as in (a), this does not affect the argument. Just 
how it may affect the conclusions under (5) and (c) I cannot say at 
present.” 


EXPERIMENTAL INVESTIGATIONS 


Miss Elderton’s data for the heights and weights of 1402 boys 
between the ages of 8.5 and 9.5 years! were made the basis of an 
empirical study to check the theoretical conclusions. For each of 
these 1402 boys a ticket was made showing his weight and height. 
These tickets were thoroughly shuffled together, and then five tickets 
were drawn at random from the box. The mean of the five weights 
on these tickets was recorded and the mean of the five heights. One 
hundred such drawings of five tickets each were made, care being taken 
to insure that no ascertainable bias should affect the drawings. Then 
a correlation was computed between the one-hundred mean heights 
and the one-hundred mean weights. The experiment was repeated 
with another hundred samples of five cards each. 

In order to see what might be the effect on the coefficient of cor- 
relation if the samples were made to exhibit definite bias, the 1402 cases 
were arranged in the order of their weights, two cases from the middle 
of the distribution being eliminated to have an even multiple of ten. 
The first ten cases on the list were then taken as the first sample, and 
their mean height and weight recorded. In this way one hundred and 
forty pairs of means were found, and the correlation between them was 
computed. Here we had samples showing extreme bias, and I 
expected to find the correlation between these means considerably 
higher than the correlation between height and weight for the original 
cases. It was actually slightly lower. In a similar fashion, biased 
samples of twenty-five cases each were made, and the correlation 
between their means computed. The results were as follows: 


For the entire population of 1402 individuals. . cesceccces © @ dhe See 
For the first hundred means of random samples of ve cases each r = .674 + .037 
For the second hundred means of random samples of five cases 

ttt aie see hehe keear eed ehesabesaeeenes r = .726 + .032 
For one hundred forty means of biased samples of ten cases each r = .681 + .031 
For fifty-six means of biased samples of twenty-five cases each r = .710 + .045 


It is possible that certain peculiarities of the distributions, in which 
the processes of averaging scores which were themselves approxima- 





1“Height and Weight of School Children in Glasgow.” Biometrika, Vol. X, 
1914-15, p. 305. 
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tions, had left gaps with very low frequencies, may have tended to 
reduce the value of r. In this experiment the bias was made much 
more extreme than in most situations where the correlation of averages 
is likely to be employed, and the outcome offers no evidence to indicate 
that the correlation of averages is necessarily higher than the cor- 
relation of individual scores. Such experimental data can, of course, 
prove nothing, but they are in better accord with the results stated 
under (a) and (b) than with the hypothesis that ‘‘the correlation 
coefficient becomes inflated when averages are employed.”’ Varia- 
tions in r much greater than these might arise from the fluctuations of 
chance alone. 








ON THE STANDARD ERRORS OF THE MEAN DUE 
TO SAMPLING AND TO MEASUREMENT 


C. L. HUFFAKER AND HARL R. DOUGLASS 
University of Oregon 


In certain treatises on statistical methods it is implied strongly 
when not stated explicitly that the standard error - the mean of a 


group of a fallible measures, due to sampling, is 57 that is the 


standard deviation of the group divided by the square root of the 
number of individuals composing the group.' 


At least one — has attempted to add to the error of the mean as 


represented by —= Th V an additional factor due to the lack of reliability 


in the test or measuring instrument.? Though attention was called 
at the time (1923) by Kelley* that the error of measurement, as well 
as the error of sampling, is already included in the formula under 
discussion, statements such as those referred to above continue to 
appear. The following derivation not only demonstrates the correct- 
ness of Dr. Kelley’s criticism but enables us to see what is the error in 
the mean due to each: (1) the chance error of measurement; (2) 
the chance errors of sampling. 

Let us first consider a group of n cases with no error in measurement, 
as would be the case where there was used a test of perfect reliability, 
4.€., 7,3 = 1, where there were only sampling errors. Let z' = true 
score of an individual (in this instance also the obtained score) 
expressed as deviations from the mean of N samples—a sufficient 
number of samples to yield a mean of negligible error, and N = 
number of such groups of which the mean is M,, 





1See Garrett, Henry E.: ‘‘Statistics in Education and Psychology.”’ Long- 
mans, Green and Co., 1926, pp. 120-25. Also Otis, Arthur S.: ‘‘Statistical Method 
in Educational Measurement.’”’ World Book Co., 1926, p. 262. The probable 
error of the mean of a distribution due to random sampling is as follows: PE (mean 

ae Oe .6745 (st. der. of dist.) 
of a distribution) = —- 
number of tases 

2 Holzinger, Karl J.: An Analysis of the Errors in Mental Measurement. 
Journal of Educational Psychology, Vol. XIV, May, 1923, pp. 278-88. 

’ Kelley, T. L.: Note upon Holzinger’s Formula for the Probable Error. 
Journal of Educational Psychology, Vol. XIV, September, 1923, pp. 376-377. 
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then 
M, — M. = 1 Face a aed... 
n 
N 
> (Mi — Me)? 
oM, (standard error of mean due to sampling) = —* eon 





> tii + te' + >> + 2! . 
— 


1 
N = yee (B(ar!)? + Bas!) + + = - 
D (tn)? + 2Dx701 + 2Dze31+ «+ - 2) 21,105!) = ye 





N 
ba + mean cross products, of the nature 22a (1) 
1 


Since each of these cross products is equal to 2rz,'z_! o2,!0,,! and 
N 
1 
we know that rz,'z-! is 0 except by chance (1) becomes Naaws 
1 


Zo?,1 en 
WT » the average square of the standard derivations of each of the 





N groups should vary but very little from o?,1 of the group actually 
measured. Employing this close approximation we may write 


No?,! o;} 


oy, = — and oy, = VN (2) 


But (true scores) 

It will be noted that this is the formula given by Yule,! but it is 
important to note also that Yule’s derivation presupposes true scores 
and not fallible test scores. He makes no mention, nor provides for 
errors of measurement in his derivation and the ¢ in his Tit is therefore 


O21 0roz\/1y.* It is quite possible that those who have taken cy, to be 


Te in which o, refers to fallible scores have failed to note that Yule’s 
n 


a. 
o in oe is really o,! (true scores). Of course, where 11,11, oz, and 
n 


gz! are identical, 7.¢., o2/1r,, = oz. 





1 Yule, G. Udny: ‘‘ An Introduction to the Thesis of Statistics.” P. 344. 
* Kelley: Op. cit., p. 213 [166]. 
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Let us now consider a group of m scores of individuals, which is a 
random sample of N such groups of equal size, but the scores of which 
are supplied by measuring instruments of imperfect reliability, 7.e., 
let xz! the true ability (or score) of an individual, and z the actually 
obtained score of the individual, both expressed as a deviation from 
the true mean of all individuals in all N groups. 

Let e be that part of z which is the result of chance errorsof measure- 
ment (the unreliability of the test) and k = any constant error present 
in the measurement. 


The mean of any given group expressed as a deviation from the 
mean of all random samples is, 


zr _ L(z'+e+k) 
. n 





The standard error of the mean due to all three types of errors, 


namely, chance errors of sampling, chance errors of measurement, and 
constant errors, is by definition 








SGrtatktattetkt -- + atte tb) 
nr 
1 


Parser ™ . ——— @) 





We should note that this formula takes into consideration both 
errors of measurement and errors of sampling. 

Expanding the numerator of the numerator 2[(x,')? + (x21)? + 
+ + + (01)? +e)? + 2? + ° + + On? + nk? — 2>(a:!"%e + x:'473! + -- - 
212,' + x',_12,' + cross products of terms involving e and k). 

Ya2'e and Y2'k terms are equal to rz'eczic, and rz'ke,o;, terms 
respectively and rz'e and rz'k are both zero, since there can be no 
correlation with a chance variable or a constant. Hence all cross 
product terms except of the nature Zz'z' drop out. In addition, it 
should be evident that except by chance there can be no correlation 
between inividuals of one random sample and those of other samples 
as individuals, so that 2x2'z' terms are each equal to 0. 

Hence (3) may now be written 


1 


ee nae N 


pla + (as')? + > + > (an')? + 1? + €2? + - - + Cn? + nk] 
1 





=~ (4) 
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n? 


1\2 1\2 eee 1\2 
But (= ss Ml me) of each group is equal to o?,1 
of that group expressed as deviations from the true mean and 
N 


pee + (x21)? + - - - (2,1)*] as ae = No*.,: 


c See 


n? n n 





and 


N 
2 ler" +e2+ 6%) 4 


_ «ite? —s Nas? 
n? : n n 


where g,1 and o, standard deviations of the true scores and of the errors 
of measurement respectively of the measured group. 








N 2 

And >(™*) = 0 (since thestandard deviation of any constant is0). 
1 

So that substituting in (4) 








= i 2 = 1 21 2 
: sae Wy Noe + No, + 0) aa mC z + Oo. ) (5) 
and eae 
_ |o,,+¢2 o2., a, 
O meserk a > > or J n a n (6) 
a2. z V 
We have shown cy, = <= or aoe By a similar proof om, 


2 


may be shown to Jf. as follows: 


Let e = error in an 2, expressed as a deviation from the mean of 
such errors in N groups. | 

M,. — M.,, the deviation of the mean error of any one group from 
the mean error of the N groups, 


N N 2 N 
1 Ct+ést °° * 6s 1 
ea = SMe — Mad? = R(T) = Rot + 


ee o” 
ez + + + + Cn? + 2ee2 + Zeres + + - + BWenuiea = LD + O(since 
: 1 





each cross product term is of necessity equal to 0 since ree. = 0) (7) 


1 N 
And ow. = y= 2,00. 
1 
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N 
, j 1 
We may take o,, as an approximation of N o.” (the mean square 
1 


standard deviation of the errors of measurement) and obtain 


2 
hin: ay oe wun 
OM. . and Om. Jn (8) 
But c, is the standard error estimate when fallible scores are taken to be 
true scores and is known to be o2°/1 — ru:.! 























Consequently 

i o:V1 ~ Te (9) 
o?,! Ce. ‘ 
We have shown ¢gn,,,., = ie which we may now know to 
ve eeu) , HL — Tn) oz, 
n n a/n 
It should be obvious when ——= is taken as ou, Ty refers to the 
n 


standard error of the mean resulting from both chance errors of measure- 
ment and of sampling and that to add to it for errors of measurement is 
not logical. It should be noted also, in passing, that constant errors of 
measurement produce no error in the mean if the constant error is 
always present, but that if there is unknown constant error in the 
sample only it is not possible to compute the standard error of the mean 
due to constant errors or of any group of errors of which the constant 
error is part. 

The fact that we have three formulas for the standard error of the 
mean necessitates careful thinking as to which should be employed 
in any given instance. For example if one is concerned only with how 
reliably the mean of the obtained test scores of a class represents the 
true ability of that class in whatever the test measures and is not 
concerned with the question as to how reliably the obtained mean of the 
class represents the true mean of a much larger group of which the 
class is a sample, the correct formula to use is 


—" o,V/1 ~- T'9 


1See Kelley, T. L.: Op. cit. 
2 Obtained by Kelley, T. L.: Op cit., pp. 83, 85. 
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If on the other hand one is making generalizations about some 
abilities or characteristics of all twelve year old boys, for example 
(an indefinite number of them) from obtained scores, one must use the 


Tr 
Vn 

An excellent example of the need for this distinction appears on 
page 211 of Dr. Kelley’s “Interpretation of Educational Measure- 


ments” in the form of formula (54) which is .260, = 13( ) ‘) 





formula om,,, = 





which may be written .260.>/7T, = 1.01175 — Te 


This formula is given by Dr. Kelley as a standard of reliability of 
tests necessary to assure the correctness of judgments concerning 
grade groups five times out of six with an accuracy of not more than 
one-half of the difference between successive half grade groups, 
e.g., 44 (mean of high sixth—mean of low sixth). 

If the problem is to pass judgment on only the group measured, that 
is to say what its educational status is, the formula given by Dr. 
Kelley includes too much. There is no error of sampling. The 


correct formula to use is .260,>/ry, = 1.01175 wwe i. 

If the task is to pass judgment on a much larger group of which the 
group measured is a fair sample, the formula given by Dr. Kelley 
should be used. The same distinction should be made with reference 
to other constants e.g., the ¢, and oman., that is the o, and oman. resulting 


= 
from errors of measurement are respectively ool =u and : 


/2N 4 
aeV1 — Tn and not —== an nd? = ~ >=, which as may be seen from their 
Vn Van . vi N’ : 

















derivations! include both errors of measurement and of sampling. 
In passing it should be noted that these formulas for o, and oman. should 
not be employed with the same degree of freedom as that for co, nor 
with the freedom suggested by some texts on statistical methods.? 





1See Kelley: Op. cit., pp. 84-86. Note that it assumes z to represent true 





measures in the derivation, that ¢ in 7 refers also to o of true measures or 
, /2N 





oeV/1—ry- See Yule: Op. cit., pp. 337-338 wherec inomén, = 1. 253317 refers 
nr 


to o of measures in which there is no error of measurement. 
2 See Garrett: Op. cit., p. 127 and problem 1, part 3, p. 146. 
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The formula o, = Tn is a special case of the general formula for 
n 


the standard error of a standard deviation, as isevident from its deriva- 
tion! and is to be employed only when the conditions assumed in the 
deviation of the special form are present, namely that the o of which 
the standard error is desired must be the o of a normal distribution. 


Likewise one is more restricted in the use of the formula oman. = 


; UN than some discussions suggested, as is evident from the deriva- 


tion of the formula and the assumptions underlying it.? 





1See Kelley, T. L.: Op. cit., pp. 85-86. 


2 Yule: Op. cit., pp. 337-338; or see Brown, Wm..: “ Essentials of Mental Meas- 
urement.”’ 1911, p. 61. 











A NEW APPARATUS FOR PLOTTING AND A CHECKING 
METHOD FOR SOLVING LARGE NUMBERS OF 
INTERCORRELATIONS 


L. DEWEY ANDERSON 
Bureau of Educational Experiments, New York City 
AND 
HERBERT A. TOOPS 


Ohio State University 


The plotting of intercorrelations and the subsequent steps in 
calculation are very tedious, and at times the outlay in effort seems to 
be much greater than the value of the results achieved. Elimination 
of unnecessary steps by the use of additional apparatus, stencils, and 
checking formulas, which increase both the accuracy and speed of 
plotting and solving of intercorrelations is becoming more and more 
important. 

During the years 1923-27, the writers conducted a research for 
which many thousands of correlations were solved. It was impera- 
tive that the most economical correlation methods be discovered. 
Of the many different plotting methods which were tried out, the one 
here described takes the least time and requires the least apparatus. 
Hitherto, in the improvement of correlation methods, the modifica- 
tion of individual correlation plots, the use of gross score formulas and 
operations checks, and the shortening of the mathematical procedure, 
have been emphasized. Little has been done in determining the best 
and shortest method for plotting the data sheet. The method described 
makes it possible for the individual research worker to plot inter- 
correlations of up to twenty-five variables without a large expenditure 
of time and funds since the apparatus is easily constructed and will 
not exceed $5.00 in cost. The time studies (reported at the end of 
the article) indicate that the cost per correlation is markedly reduced. 

The first section of this report describes the plotting apparatus 
and the second section describes the method to be used in solving the 
correlations, together with the checking methods, which are applied 
at every step. The solving method which has been applied is similar 
to the one described by Toops in the two articles to which reference 
is later made. 
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Part I. Tue PuLotrinc APPARATUS 


The plotting apparatus described here has the following advantages: 
(1) No printed job sheets are necessary; (2) all the correlation plots of 
one variable with all others with which it is correlated are located on 
only one sheet; (3) the accuracy of plotting is increased since the 
registering of a tally mark is made by reference to one scale only— 
rather than two—and that one located always adjoining the compart- 
ment where the tally mark is to be located; and (4) the actual plotting 
time is shortened considerably. The maximum capacity of the 
apparatus is twenty-four plots at atime. If intercorrelations on more 
than twenty-five variables are being obtained, double runs can be 
made. The time-saving qualities of the method become evident 
when the intercorrelations of three or more variables are plotted. 

In any method of plotting intercorrelations, the following represent 
the minimum essentials of the plotting paper (if a calculating machine 
is available for use): 


1. A plotting space, say, 18 X 18 steps, for recording tally marks 
in a scattergram. 


2. Space on two sides of the chart for recording X- and 
Y-frequencies. 

3. Space on two sides of the chart for recording diagonal frequencies. 

4. A method of quickly locating the proper column and row, 7.e., 
compartment for recording a given tally mark. 

5. A method of identifying marginal and diagonal frequencies 
with respect to the step-values they bear for making the extensions of 
steps by frequencies. 

6. A method of checking the plotting to know that it is correct. 

In the design of this new apparatus, all of the above requirements 
have been taken into account. The method here described is based on 
the usual one used for the plotting of correlations in which tally marks 
are placed in a square arrangement, which is divided into 18 X 18 
steps, to indicate the individuals’ scores in both variables under con- 
sideration. This apparatus simply adds a number of devices to 
facilitate the process of plotting by making it possible to plot several 
scattergrams simultaneously, thus eliminating waste of time and 
effort. In addition to the above named requirements the following 
additional practical considerations have been taken into account: 

1. In order to speed the work of computation and to prevent loss 
of data sheets, it is desirable that all scattergrams in which X is a 
common or constant variable be located together on one sheet. 
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2. One cannot economically rule by hand the vertical and hori- 
zontal lines on this paper.!' Therefore commerically-ruled cross 
section paper which is available at most stationery stores is used. 

3. The necessary additional hand ruling of the paper, with pencilled 
lines, should be a minimum in amount, should not require extensive 
laying out or measuring, and should, if possible, permit being laid out 
with a pattern or templet which will practically preclude the possibility 
of error. 

4. The plotting compartments should be at least one-quarter 
inch square. 

All of the above have been incorporated into a unitary scheme. 

It is the viewpoint of the great majority of statisticians actually 
engaged in correlation work, and also of those people who have devised 
correlation blanks that when several scattergrams are to be plotted 
with the same set of variables it is time-saving to convert the raw 
scores into small transmuted scores ranging, say, from 0 to 17.? 

It has been known for some time that, having such transmuted 
scores, one may mechanically find the column on the several plots (or 
rows, if a different method is used) containing person A’s X-score, say 
his score on test 1, and then while forgetting the X-score (which is 
mechanically ‘‘remembered”’ by the machine) one may plot A’s 
successive Y-scores on tests 2, 3, .. . n, by merely locating the 
proper rows on the successive charts, depending on the magnitude of 
the several Y-scores successively, and entering a tally mark in each 
selected row of the successive charts.* Once the column has been 
determined, a reader may read off to the plotter in rhythm 3 or 4Y- 
scores, which are then tallied on the successive charts and called back 
by the tally-marker to the reader for verification. This has been 
worked out by Toops on polar coordinate paper; but the polar method, 
while rapid, is eye-straining and fatiguing, and for small N’s a dispro- 
portionate amount of time is spent in tacking the printed sheets to the 
rotating table. Toops and Miner use a T-square on which the X- 
classifications for several charts, placed end to end, are pasted on 





1 If one has considerable work of this kind, it would pay to have paper specially 
ruled for this purpose, using red lines to distinguish the border lines of adjoining 
scattergrams. 

2 Toops, H. A.: Computing Intercorrelations of Tests on the Adding Machine. 
Journal of Applied Psychology, Vol. VI, No. 2, 1922, pp. 172-184. 

3 Toops, H. A.: Two Devices for Aiding Calculation. Journal of Experimental 
Psychology, Vol. IX, No. 1, Feb., 1926, pp. 64-66. 
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the T-square blade. This requires some time in the alignment of the 
charts, and fifteen intercorrelations is the maximum capacity of the 
method. The T-square also has to be re-located at the proper column, 
depending on the X-score, for every successive column of five charts, 
which is a time-consuming operation and also a possible source of 
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plotting error. This difficulty is avoided in the newer apparatus 
since the X-variable score is located for all the plots at one and the 
same time. 

The apparatus, a section of which is shown in Fig. 1, consists of a 
drawing board (A), twenty-five inches wide and forty-three inches long 
and a frame made of four boards, two of which parallel the top and 
bottom, and two of which parallel the right and left sides of the 
drawing board. This frame is so built that the horizontal boards 
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(the top board of which is shown and marked E in the illustration) 
have a bearing against the top and bottom of the horizontal edges of 
the drawing board while the vertical supports (of which the left one 
is shown and marked C in the illustration) rest upon the drawing board. 
The horizontal boards are two inches wide, forty-three inches long and 
one inch thick. The vertical boards (or cross boards) are two inches 
wide, twenty-nine inches long and one inch thick. These last pieces 
which are bolted at the ends to the horizontal pieces make the frame 
solid. In other words, we have a framework something like a T-square 
with two heads, one on each end of the blade. G;, Gz (to Gs) are 
metal (tin) guides twenty-nine inches long and one-half inch wide 
which are fastened at the two ends to the horizontal pieces (Z) of the 
framework and so run vertically across the drawing board, or perpen- 
dicular to the horizontal edges of the board. Each of the metal guides 
carries four complete Y-classifications (steps 0 to 17) one above 
another. A sheet of quarter-inch coordinate paper, twenty-two 
inches by thirty-four inches is thumb-tacked on the board in align- 
ment with the Y-classifications of the metal guides. The metal 
guide strips are so spaced and are of such a width (one half inch) that 
by drawing pencil lines to the left and to the right of each, the vertical 
outlines of the several charts are all laid out. By rotating the frame- 
work through an angle of ninety degrees relative to the drawing 
board, the horizontal lines may then be pencilled in by using the 
guide strips in similar fashion. 

To use the board it is necessary that the raw scores be transmuted 
into class interval scores of 0 to 17 or less.! It is not necessary, of 
course, that there be fully eighteen class intervals. A high score in the 
variable is given a high transmuted score and vice versa. The trans- 
mutation of scores takes but very little time. The writers found 
that the time required to transmute the scores of one-hundred individ- 
uals on twenty-five variables is 314 hours or .7 of a minute per 
correlation. | 

The transmuted scores are recorded as indicated in Table I; 
the name of the individual is at the left and his transmuted scores in 
the several variables are entered consecutively across the page. 

From this point on, it is necessary that two people work together. 
One reads the scores aloud; and the other records them. 





1 Toops, H. A.: Computing Intercorrelations of Tests on the Adding Machine. 
Journal of Applied Psychology, Vol. VI, No. 2, 1922, p. 172ff. 
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TaBLE I.—TuHE ARRANGEMENT OF THE TRANSMUTED ScorRES FOR PLOTTING 
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After a sheet of coordinate paper has been tacked on the board and 
the boundary lines for the separate plots have been pencilled in as 
already indicated, the marker writes directly upon the successive 
columns in plot, 712, (see Fig. 1—upper left hand corner), the numbers 
0-17. These are the X steps. It is not necessary to enter these on more 
than one plot since all X classifications are automatically taken care of 
by the vertical guides, z.e., when the first guide strip, G, is located 
for a certain X transmuted score, the other guide strips are located 
correctly for the corresponding columns of all the correlation plots. 
The actual plotting now starts. The reader calls off John Doe’s 
record in variable 1, which is 10, and the marker moves the entire 
sliding frame until guide G, is located just to the left of column 10, 
as indicated by the classifications at the top of plot, riz. This opera- 
tion lines up all the guide strips to the left of column 10 in all the 
twenty-four plots. In other words, one movement of the sliding 
frame makes simultaneous alignment for all plots. The X position of 
John Doe on all the plots is thus located once for all and the frame is 
left in this position until twenty-four tally marks are recorded, one on 
each of the plots. The common X-class can henceforth be forgotten 
since it is mechanically ‘remembered’ by the frame, and the scorer 
can fix all of his attention on the simple task of locating the tally 
marks on the several plots, (the Y-scores or the scores on variables, 
2,3, 4... . 25). All the marker has to do is to place tally marks 
in the columns directly to the right of the guide strips in the actual 
Y location as given him by the reader. 

After the X classification is located the reader calls off the Y-scores 
for variables, 2, 3 4 and 5 and the marker records the score for vari- 
able 2, which is 17, opposite the top 17 on guide strip G, in the space 
directly to the right of the guide strip. He then goes down to plot 
Ty3 and enters a tally mark opposite transmuted score 14 (Doe’s score 
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in variable 3). Tally marks for variables 4 and 5 are recorded on 
plots ri, and 11, respectively. As the marker records the scores, he 
calls them back to the reader who mentally checks them for accuracy. 
After tally marks are placed for the first four plots, the reader calls 
the scores for variables 6, 7, 8 and 9, and they are recorded and called 
back by the marker. This operation is similarly repeated for all the 
columns of correlation plots. Now tallies have been inserted for 
variables 2 to 25 with variable 1. The reader then reads the X 
(variable 1) score for Richard Fish in the same manner, and the 
marker first makes sure that the frame work is located so that the 
guide strip is opposite the X classification of 4 (Fish’s score on variable 
1). This process is continued for all individuals. It is usually quick- 
est for the reader to call the Y-scores off in groups of four, in an accen- 
tuated rhythm thus “four, three . . . six, ten.’ This corresponds 
to the grouping of the plots by fours in vertical columns. The cor- 
relations riz, 713, 714, 715, are plotted by means of the first guide strip; 
T16) 712) T1s, T19 by means of the second guide strip; and so on. The 
appearance of a completed plot is shown in r:, of Fig. 1. 

Upon completion of the plotting of any one variable with all the 
others, (in Fig. 1, variable 1 with the other twenty-four variables), 
the sheet is taken from the drawing board and another tacked on in 
its place. In actual practice ten or twelve sheets can be tacked on 
the board at the same time. The sheets are removed when necessary 
by tearing away the paper at the corners from the thumb tacks. This 
saves considerable time since the aligning of a sheet to the board is a 
time consuming operation. 

At this point any method of solving the correlations can be used. 

In intercorrelation computations one always has standard marginal 
frequencies for X and for Y. The various summations, 2X, ZY, 
2X2, DY2, can be computed and checked once for all, there being only 
2n of them in all. Thereafter if the X- and Y-marginal frequencies 
check up with the standard frequencies there is no need for recomputa- 
tion of these quantities. 

Each of the intercorrelations has one unpredictable series of diagonal 
extensions of frequencies which are to be multiplied by squares of the 
respective diagonalsteps. Thesum of theseextensionsyield 2(X — Y)?, 
which must be computed for each correlation plot. The method next 
to be outlined for solving correlations uses job sheets, which speeds 
up the work considerably. If other methods are used, it is necessary 
either (1) to separate the several sheets by cutting so that the individual 
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data sheets can be pasted or pinned on correlation charts or other 
coordinate paper, or (2) a sheet of paper four inches by twenty-one 
inches for making extensions can be pasted to the right of each vertical 
set of plots so that it can be folded back over the plots when the solv- 
ing is completed. This permits the correlations to be worked directly 
by any method selected and has the advantage of having all the data 
for correlations of one variable with those with which paired, in one 
place. 


(To Be Continued.) 














A COMPARISON OF “APTITUDE” AND “TRAINING” 
TESTS FOR PROGNOSIS! 


T. A. LANGLIE 
University of Minnesota 


This study is an analysis of the results obtained by giving the 
Iowa Placement examinations? in English, mathematics, and chemistry 
to some three hundred freshmen in the College of Engineering at the 
University of Minnesota. More particularly, the purpose of this 
study is to evaluate the aptitude and training tests in relation to 
each other, as devices for predicting scholarship in courses in 
engineering. 

Presumably, the training tests are measures of achievement while 
the aptitude tests are measures of special native and acquired capacities 
to achieve. The 1924 edition of the Iowa tests included the following 
content: 


English, training—series ET 1 
Part 1. Spelling 
Part 2. Punctuation and sentence structure 
Part 3. Grammar 
Part 4. Clearness, emphasis and force 
English, aptitude —series EA1 
Part 1. Ability to comprehend and apply rules 
Part 2. Ability to secure correct ideas from English textbook material 
Part 3. Reading comprehension 
Part 4. Literary appreciation 
Mathematics, training—series MT1 
Part 1. Fundamentals of arithmetic 
Part 2. Fundamentals of formal algebra 
Part 3. Fundamentals of geometry 
Part 4. Algebraic reasoning problems 
Mathematics, aptitude—series MA1 
Part 1. Arithmetic and algebraic number series 
Part 2. Constructive imagination 
Part 3. Pure and mathematical logic 
Part 4. Mathematics reading comprehension 





| 
1 The writer is indebted to O. M. Leland, Dean of the College of Engineering, 
University of Minnesota, for access to the data of this study, and to Donald G. 
Paterson for advice and guidance in the analysis of the data. 
2Seashore, C. E., and Ruch, G. M.: “The Iowa Placement Examinations, 
Edition 1924.” University of Iowa. 
658 
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Chemistry, training—series CT1 
Part 1. Knowledge of fundamentals of chemical processes 
Part 2. Valence, formulas, equations 
Part 3. Manufacturing processes, applied chemistry 
Part 4. The fundamental chemical problem 
Chemistry, aptitude—series CA1 
Part 1. Simple arithmetic of chemistry 
Part 2. Ability to secure precise data from chemical paragraphs 
Part 3. Chemistry reading comprehension 
Part 4. Interest in chemistry (as measured by accuracy of common informa- 
tion. 


These tests were given to the entire freshman class during the 
first week of the fall quarter. Each test requiring one hour for 
administration was given to the entire group at the same time. The 
criteria with which these test scores are compared are grades given 
the students at the end of the fall quarter of 1924. Individual course 
grades are designated in the customary manner by letter grades from 
A to F, E being a conditional grade. For the purposes of this study, 
E is considered the same as an F.! Total scholarship is represented 
by “‘honor points.’”’ One honor point is granted for every credit of 
C work, two honor points for every credit of B, and three for every 
credit of A. An F is given a negative value of one point, and D is 
considered as zero. Thus, a student who attains five credits A, five 
credits B, and five credits C, is given a score of (5 X 3) + (5 XK 2) + 
(5 X 1) or 30. These scores are used as the criteria for evaluating 
the tests of training and aptitude. 


RESULTS AND DISCUSSION 


Table I presents coefficients of correlation between the placement 
tests and the particular subjects for which the tests are devised. 

Evidently these tests have a fairly satisfactory value as predictive 
devices. Correlations of +0.40 and +0.50 between tests and single 
subjects are fairly high, and two of these tests correlate above +0.50 
with their related courses. They are English training with grades in 
English, and chemistry training with grades in the two chemistry 
courses, numbers 4 and 14. The average of the correlations for the 





1Grades E and F are combined arbitrarily because it was desired to have a 
value which represented the student’s work for the quarter only, not the grade he 
received after additional study. The number of E’s was small and the effect of 
their presence is negligible as it concerns this study. 
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training tests is +0.516 but of the aptitude tests only +0.390. This 
difference indicates that the training tests enable us to predict more 
accurately the grade a student will get in a particular course. These 
course grades have a fair degree of reliability also, as measured by 
correlating mid-quarter grades with final grades. The reliability 
coefficients range from +0.64 for English to +0.79 for chemistry 4. 


TABLE I.—CoOEFFICIENTS OF CORRELATION BETWEEN APTITUDE AND TRAINING 
TESTS AND SCHOLARSHIP IN PARTICULAR SuBJEcTs! 











‘ r with aptitude r with training 
Subject N test test 
CS EG See 320 +0.46 + 0.035 | +0.59 + 0.025 
Mathematics 9............... 170 +0.26 + 0.050 | +0.45 + 0.037 
Mathematics 11.............. 160 +0.36 + 0.040 | +0.45 + 0.037 
| ee 74 +0.39 + 0.067 | +0.55 + 0.047 
a re 170 +0.48 + 0.040 | +0.54 + 0.038 
CG dtbds chad 6e wade +0 .390 +0.516 











1 The course numbers are as they appear in the University Bulletin. English 
4 is a three credit course in composition. Mathematics 9 is higher algebra. 
Mathematics 11 is college algebra. Both chemistry 4 and 14 are general inorganic 
of 4 and 5 credits respectively. 


Table II presents coefficients of correlation between the several 
placement tests individually and total scholarship scores, as represented 
by the number of honor points earned. Multiple correlation coeffi- 
cients are also given in this table, between the three training tests 
and total scholarship, between the three aptitude tests and total 
scholarship, and between all six placement tests and total scholarship. 


TaBLE II.—CoEFFICIENTS OF CORRELATION BETWEEN THE PLACEMENT TESTS AND 
Totau ScuoutarsHip (Honor Points) 











N = 241 

Aptitude test r with scholarship Training test r with scholarship 
PR ndscwecnnws +0.39 + 0.035) English............ +0.44 + 0.035 
Mathematics........ +0.47 + 0.034) Mathematics....... +0.54 +0.031 
Chemistry.......... +0.44 + 0.035) Chemistry.......... +0.40 + 0.036 

Average........ +0 .433 Average........ +0 .460 
Multiple (all apti- Multiple (all train- 

tude tests)........ +0.520 + 0.032) ing tests).......... +0 .606 + 0.027 

Multiple (all tests)...| +0.607 + 0.027 
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These coefficients, also, are fairly high. A correlation of +0.54, 
as obtained between the mathematics training test and total scholar- 
ship is very good for a one hour test. The best aptitude test for this 
purpose is the mathematics examination which correlates +0.47 with 
scholarship. Chemistry aptitude, which is a better instrument for 
prediction than chemistry training, shows the only reversal of form. 
As a rule, the training tests are better devices by which to predict 
subsequent performance, although the differences are not generally 
statistically significant. The average of the training test coefficients 
is +0.460 while the average of the aptitude coefficients is +0.433. 
A multiple coefficient of correlation between the three aptitude tests 
and scholarship is not as high as a multiple between the training tests 
and scholarship, it being +0.520 as compared with 0.606 for the latter. 
Indeed, adding the aptitude tests to the training test multiple raises 
the coefficient to only +0.607. It seems evident that the aptitude 
tests add nothing to the training tests for purposes of predicting 
scholarship—nothing, that is, except reliability. The reversal of 
the general rule in the chemistry tests may be attributable to the 
greater specificity of content of the training test and to a greater lack 
of any knowledge of chemistry on the part of many freshmen. This 
view is substantiated by the form of the distribution of scores on this 
test, a noticeable bi-modality being present. Such a distribution 
seemingly does not affect the value of the test for predicting grades in 
chemistry, however, as already evidenced in Table I. 

To emphasize further the value of these tests as prognostic devices 
and to illustrate the superiority of the training tests, the mean raw 
scores of different achievement groups are presented in Table III. 


Tasie II].—Means Aanp Sicns or DIFFERENT ACHIEVEMENT GROUPS, WITH 
RELIABILITY COEFFICIENTS OF THE DIFFERENCES 





_D 





| Difference 
PEp 
Subject N Mean score Mean score a 
aptitude test training test 
Apti- | Train- | Apti- | Train- 
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Datheesation ®.......... 170| 30.53 + 7.82] 47.58 + 9.18 
Mathematics 11......... 160| 35.25 + 10.44] 59.16 + 7.44{| 4°72 | 11-58 | 7.11 | 19.17 
Chemistry 14........... 74| 74.94 + 7.10, 51.45 + 15.90 
Chemistey 4.........0.. 170| 78.18 + 5. 66 aks eee OO eae 








Mathematics 9 is higher algebra and is taken by those who have 
had no higher algebra in high school and by those who cannot do sat- 
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isfactory work in mathematics 11, which is college algebra. Similarly, 
chemistry 14 is general inorganic with no prerequisite and chemistry 
4 is general inorganic with the prerequisite of high school chemistry. 
Clearly, mathematics 11 and chemistry 4 are more advanced courses 
than mathematics 9 and chemistry 14. Test scores show a like 
difference between these groups, both in aptitude and in training. 
The differences between the mean scores for these different achieve- 
ment groups are statistically significant as indicated by the size of 
the coefficients of reliability of the differences. Training tests, there- 
fore, differentiate more than do the aptitude tests, which suggests 
that for sectioning purposes an actual test of accomplishment is better 


than a test of aptitude. Further evidence along this line is presented 
below. 


TaBLE IV.—DIstTRIBUTION OF GRADES IN PARTICULAR CouRSES BY PER CENT 





























Subject N F D Cc B A | Total 
Sn 3200! 31 | 36 | 27 | 5 | 1 | 100 
Mathematics 9..............| 170 27 37 23 10 3 100 
Mathematics 11.............| 160} 27 34 24 11 4 100 
| 741 43 15 26 8 8 100 
Chemistry 4................ 170 | 33 28 | 22 12 | 5 100 





TABLE V.—TRAINING TEST MEAN ScorEs or LETTER GRADE GROUPS IN VARIOUS 




















Sussects. Scores in TERMS OF SIGMA 
Subject F Group | D Group C Group | B Group | A Group Total 
l 
ee. as 42.64+ 8.75) 49.70 + 8.04| 55.68 + 7.10) 61.62 + 8.67) 66.02 + 5.80) 50.00+ 10.00 
Mathematics 9......... 39.41 + 8.82| 42.10 + 6.80) 49.27+ 8.25 49.21+ 5.24) 53.91 + 5.80| 44.32+ 9.05 
Mathematics 11........| 51.42 + 6.30) 54.44 + 6.66| 58.344 5.71) 60.06+ 7.63) 63.11+ 4.14) 55.744 7.34 
Chemistry 14..........|-36.32 + 7.33] 38.19 + 7.71] 42.694 6.77, 50.11 + 5.63| 49.23 + 7.33) 40.40+ 8.48 
Chemistry 4........... | 50.53 + 5.63) 53.17+ 7.52) 54.85 + 5.39) 62.11+ 4.40) 


63 .89 + 5.53) 54.21+ 7.36 





TABLE VI.—AprTitupE Test MEAN Scores oF LETTER GRADE GROUPS IN VARIOUS 




















Sussects. Scores 1n TERMS OF SIGMA 
Subject F Group D Group | C Group | B Group | A Group Total 
English 4........... 44.13 + 10.49] 50.02 + 8.54) 54.51+ 7.81) 56.81+ 5.59) 64.32 + 8.07| 50.00+ 10.00 
Mathematics 9....... 42.48+ 7.98] 44.86+ 9.95) 50.26+ 8.58| 50.19+ 9.17) 50.39+ 6.36! 47.71+ 10.10 
Mathematics 11...... 51.13+ 8.22) 52.64+ 10.00) 54.42 + 8.24) 60.00 + 7.57) 60.88 + 3.20) 53.81+ 13.49 
Chemistry 14........| 41.42 + 13.29] 47.414 7.44] 50.28+ 7.12) 51.90 + 6.58) 52.94+ 6.96| 46.52 + 11.23 
Chemistry 4.... ....| 52.15+ 9.97] 51.04+ 8.25] 52.78+ 5.06) 60.22 + 6.52 57.50 + 7.18] 51.58+ 8.96 











Table IV presents by per cent the distribution of letter grades in 
the various subjects. 


All of the distributions are very much skewed 
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toward the lower end of the scale. While such distributions tend to 
invalidate the meaning of correlation coefficients, the following data 
support the contentions put forth at the beginning of this paper on 
the basis of the coefficients of correlation. 

Tables V and VI present the mean scores for each letter grade 
group of each course in terms of placement test scores. The scores 
are converted into tenths of sigma of the distribution, the mean raw 
score being made equal to 50.00. Then a score which is one sigma 
above the mean is 60.00 and a score one sigma below the mean is 
40.00. The scores of all students in both mathematics courses were 
combined and the mean of the total group of 330 mathematics students 
was considered as 50.00. The same was done with the two groups of 
chemistry students. 

It is evident that these tests do differentiate students according to 
their ability to get grades. It is also evident that the training tests 
are again superior to the aptitude tests in this respect. Not only are 
the mean scores of letter grade groups farther apart in terms of training 
test scores, but the sigmas of the groups are smaller, as a general rule. 
In other words, the training tests tend to separate achievement groups 
more completely than do aptitude tests. In both mathematics and 
chemistry it is possible by starting with the mean score of the F group 
in the lower class, to proceed upward in terms of test scores, through 
each successive grade to A. In the lower class in mathematics, F 
falls just above A in the higher mathematics class, and in chemistry, 
F in the lower class falls below A in the more advanced class. The 
progression is not quite as regular in terms of the aptitude test scores. 
Neither are the differences as great though the sigmas are larger. The 
range of mean scores in English is from 42.64 for the F group to 66.02 
for the A group, or 23.38 points, on the training test. This represents 
a range of over two sigmas. The range on the aptitude test is from 
44.13 to 64.32 or 20.19 points. The training test has a slight advan- 
tage, which is added to because of the lesser variability of scores on the 
training test. In each case the training tests show up to better 
advantage than do the aptitude tests as prognostic devices. 

One measure of the importance of a difference between scores of 
two groups is the degree of overlapping. The greater the degree of 
overlapping the less significant is the difference. A detailed study of 
the degrees of overlapping has been made with these data, and the 
results support the conclusions already drawn. In every subject, 
there is a greater degree of overlapping in terms of scores on the 














664 The Journal of Educational Psychology 


aptitude tests than on the training tests. A brief summary of these 
percentages are contained in Table VII. 


TaBLE VII.—AveraGes oF PERCENTAGES OF OVERLAPPING 

















Average per cent exceeding | Average per cent exceeding 
Subject median of higher grades median of lower grades 
Aptitude test | Training test | Aptitude test | Training test 
English 4.......... 22.4 14.9 85.4 87.7 
Mathematics 9..... 27.1 23.1 75.2 78.7 
Mathematics 11... . 38 .2 22.4 66 .7 82.0 
Chemistry 14....... 31.7 16.6 73.9 87.0 
Chemistry 4........ 24 .6 15.9 73.8 85.6 
Average......... 28.8 18.6 75.0 84.2 











These percentages are the averages of the per cent of each letter 
grade group in each subject which exceeds the medians of the other 
letter grade groups, above and below. The smaller the per cent of 
cases exceeding the median of higher groups, the better is the test. 
The greater the per cent of cases exceeding the median of lower 
groups, the better is the test. While the revealed differences are not 
generally significant statistically, there is a marked tendency for true 
differences to be revealed. The final average of the aptitude tests 
show that 28:8 per cent of cases fall above the median of higher 
letter grade groups and 75.0 per cent fall above the median of lower 
groups. The training tests again are better, the same figures being 
18.6 per cent and 84.2 per cent respectively. There can be no question 
but that these training tests are actually better prognostic devices 
than are these aptitude tests. 


SUMMARY AND CONCLUSION 


This study demonstrates that achievement in engineering courses 
can be predicted with a fair degree of reliability. Coefficients of cor- 
relation between test scores and final grades in five subjects are fairly 
high; test scores correlate satisfactorily with total scholarship; mean 
scores of different achievement groups on the same test show sig- 
nificant differences; mean scores of letter grade groups in single subjects 
show differences corresponding to the values of the letter grades; and 
finally, differences of letter grade groups are fairly marked as indi- 
cated by degree of overlapping. These results are consistent through- 
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out this study, indicating that they are valid results, not a product of 
chance. They are in line with the results achieved in colleges through- 
out the country by means of intelligence tests, and in some engineering 
colleges by use of Thurstone’s Engineering Aptitude! tests. For 
administrational use the tests are valuable for purposes of sectioning 
classes on the basis of ability, for determining a student’s ability to 
do college work, and for indicating the load a student may carry 
satisfactorily. 

The superiority of achievement tests over aptitude tests has 
hitherto not been studied and reported in the literature. In this 
study, it is one of the outstanding results. The training or achieve- 
ment tests have consistently proved to be better instruments for 
predicting future achievement. Accomplishment, no doubt, does 
measure aptitude. But it also measures other characteristics such as 
habits of study, drive, and many others which affect scholarship. 
Aptitude tests may be more difficult to construct, hence less reliable 
as measures of future accomplishment. Whatever the theoretical 
basis for the comparative showing of aptitude and training tests is, 
for administrative purposes judging from the results of this study, 
tests of achievement seem more useful and accurate as instruments of 
prognosis. 





1 Thurstone, L. L.: Intelligence Tests for Engineering Students. Engineering 
Education., Vol. XIII, 1923, pp. 263-318. 
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INTELLIGENCE TESTING 


The Relation between “ Intelligence’ and Reflex Conduction Rate. Lee Edward 
Travis and Theodore A. Hunter. Journal of Experimental Psychology, Oct., 


1928, 342-354. The writers secured correlations between reflex conduction rate 


and (a) Otis Higher, .87 (N, 44 adults); (b) University of Iowa qualifying examina- 
tion, .87 (N, 43); (c) Iowa Placement English, .81; (d) Iowa Comprehension, .77; 
and (e) Iowa Placement Mathematics, .71. 

Scholastic Aptitude Tests in Amherst College. Charles H. Toll. School and 
Society, Oct. 27, 1928, 524-528. Five successive entering classes (N, 157 to 232) 
were given two or more intelligence tests. Correlations between (a) Otis SA 
Higher, Army Alpha, Terman Group, three tests from American Council Psy- 
chological Examination, C. H. T. or sentence completion-vocabulary, two editions 
of the Amherst Test, both singly and in combination, and (b) college grades ranged 
from .16 to .38. The tests proved non-discriminative for predicting either failure 
or success in college work. 

The Relation between Cranial Capacity, Relative Cranial Capacity and Intelligence 
in School Children. G. H. Estabrooks. The Journal of Applied Psychology, 
Oct., 1928, 524-529. Correlations between intelligence (Dearborn A, Otis Group 
Primary, Stanford-Binet) and cranial capacity were positive but low; between 
intelligence and stature-capacity index, practically zero. 

Graded Series of Form Boards. Grace H. Kent and David Shakow. The 
Personnel Journal, Aug., 1928, 115-120. Illustrations of a new industrial model 
and of a new clinical model are included with the article. 

A Study of the Intelligence and Achievement of Full-blood Indians. Thomas R. 
Garth, Hale W. Smith and Wendell Abell. The Journal of Applied Psychology, 
Oct., 1928, 511-516. A total of 1000 cases in Grades IV to IX, average CA of 
12.9 to 17.9 were tested with the Otis Classification Test. Correlations between 
intelligence, achievement, CA and school grade were reported. 

The Influence of Heredity on the Mentality of Orphan Children. Robert A. Davis, 
Jr. The British Journal of Psychology, July, 1928, 45-59. The data from 1051 
orphanage children (Grades II to: VIII) and from 504 Texas school children tested 
with Haggerty and Dearborn tests revealed this conclusion: Orphanage children 
who have been in orphanage homes for a considerable time are no more alike than 
children who have been brought up in their own homes. 

What Changes the IQ? Mary L. Dougherty. The Elementary School Journal, 
Oct., 1928, 114-121. Case studies of two boys whose IQ’s shifted from 84 to 101 
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and from 134 to 156, the changes being due in one case to relief from home and 
school strain, and in the other to competition in a new school class. 

Three Points of Interest to Mental Test Constructors. C. 8S. Slocombe. The 
British Journal of Psychology, July, 1928, 31-33. The writer summarizes: The 
selective test or multiple choice (although more nearly accurate in measuring) 
and the recall test measure the same “type of intelligence;” ‘“‘synonyms and narra- 
tive completion are the best types of subtest for children age 11 to 12.”’ 

The Role of Speed in Mental Ability. Rose A. McFarland. Psychological 
Bulletin, Oct., 1928, 595-612. Summary of significant studies to date. 

Some Clinical Examinations and Their Implications. Edward A. Lincoln. 
Educational Administration and Supervision, Oct., 1928, 461-468. A report and 
discussion of the distribution of four hundred sixty-one children, ages 4 to 17, 
Grades I to X, by ages, grades, intelligence quotients, and the recommended dis- 
position of the clinical cases. The median IQ of the total group was 75. 

A Psychological Study of Athletes. Vern W. Ruble. American Physical 
Education Review, Apr., 1928, 219-234. A study comparing athletes with non- 
athletes at Indiana University reveals little difference between the two groups 
either in intelligence (Thurstone test) or in academic success. 


MEASUREMENT OF ACHIEVEMENT 


A Comparison of Five Types of Objective Tests in Elementary Psychology. G. M. 
Ruch and John W. Charles. The Journal of Applied Psychology, Aug., 1928, 
398-403. Corrected reliabilities for a test of one hundred items are presented: 
(a) Recall, .752; (b) five Response, .809; (c) three response, .768; (d) two response, 
.646; and (e) true false, .751. 

Minor Studies on Objective Examination Methods. The Negative Suggestion 
Effect of True-false Tests. Hazel M. Robertsand G. M. Ruch. Journal of Educa- 
tional Research, Sept., 1928, 112-116. The effect proves to be small, positive and 
transitory. 

A Study of Examinations in Graduate Courses in Education. John M. Brewer. 
The Educational Record, Oct., 1928, 225-241. An analysis of recent examinations 
at Harvard University. 

The Objective Measurement of English Composition. Robert Pooley. The 
English Journal, June, 1928, 462-469. ‘“‘The test consisted of eight sections 
grouped into these general classes: Mechanics, including punctuation and spelling; 
sentence structure, involving the analysis and synthesis of sentence elements; and 
maturity, tested by the pupil’s skill in the use of idiomatic expressions and in the 
comprehension of words.” A reliability coefficient of .83 (N, 19) is reported. 

Use of New Type Examination Questions in Psychology in the University of 
Minnesota. Donald G. Patterson. School and Society, Sept. 22, 1928, 369-371. 
Suggestive methods of collecting, validating, and using questions for college classes 
from year to year are presented. 

The Abilities and Achievements of Elementary School Pupils before and after a 
Vacation. M. J. Nelson. School and Society, Sept. 22, 1928, 371-372. The 
writer tested fifty cases in each grade (III, V and VII) with Otis SA Intermediate, 
National Intelligence Test A, Stanford Achievement, Morrison McCall Spelling, 
and Courtis Standard Research Arithmetic B. Only in intelligence for the seventh 
grade was the vacation gain greater than the school gain. 
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A Mental-educational Survey of Iowa Junior Colleges. George D. Stoddard. 
The School Review, May, 1928, 346-349. Slight differences tending to favor the 
junior colleges (N, 378) rather than University of Iowa Freshmen (N, 1048) are 
found in the Iowa Comprehension Test, Placement Examination (English and 
Mathematics Aptitude), and the High School Content Examination. 


PsycHOLOGY OF LEARNING AND ScHOOL SUBJECTS 


The Even-front System versus the Rotation System in Laboratory Physics. H. W. 
Duel. The School Review, June, 1928, 447-454. A group of one hundred sixty- 
eight pupils in second term physics was practiced for a period of twenty laboratory 
periods of one hour each. In the Rotation System small groups of pupils worked 
on different experiments, each group proceeding at its own rate. In the Even 
Front System all pupils proceeded to the next experiment only after sixty-seven to 
seventy-five per cent had finished a given experiment. The Rotation System 
proved decidedly superior in an objective test of one hundred thirty questions, in 
the mean number of exercises completed and significantly superior in the Gleen- 
Osbourn and Hurd physics tests. 

A Comparative Study of the Results Obtained by the Method of Mastery Technique 
and the Method of Daily Recitation and Assignment. M.N. Funk. The School 
Review, May, 1928, 338-345. Of four classes (twenty-one to twenty-three pupils 
each) in the senior class of a high school, two served as the experimental group. 
An experiment over a period of nine months showed no differences (a) in objective 
test scores covering information, application, and organization, or (b) in Chapman 
Unspeeded Reading Comprehension Test. However, the mastery-technique group 
read a greater number of pages in reference books; pupil opinion also favored this 
method. 

A Comparison of the Lecture-demonstration, Group Laboratory Experimentation, 
and Individual Laboratory Experimentation Methods of Teaching High School 
Biology. Palmer O. Johnson. Journal of Educational Research, Sept., 1928, 
103-111. Three groups of eleven students each and three of seventeen each were 
tested for immediate and for delayed recall. In achievement, ‘‘the demonstration 
method outranked the others in five of the six sets of experiments,” but results 
were not conclusive statistically. There was no significant difference between 
group and individual methods. 

A Study of Difficulties in Chemistry. Arthur R. Stewart. School Science and 
Mathematics, Nov., 1928, 838-848. A comprehensive study of the difficulties and 
errors of seventy-two high school students in chemistry. 

The Relative Values of Unified and Correlated Mathematics in Presenting the 
Fundamental Operations. Raymond R. Wallace. School Science and Mathematics, 
Oct., 1928, 740-747. Most of the difference, although not significant, favored the 
unified course. 

A Preliminary Study of Mathematical Difficulties. Wilbur Alden Coit. The 
School Review, Sept., 1928, 504-509. Groups of students (two hundred sixty high 
school and two hundred thirty university) were administered tests in elementary 
algebra. An analysis of these errors revealed that the types of errors were similar 
in high school and in the university. 

Analysis of Difficulties in Decimals. Leo J. Brueckner. The Elementary 
School Journal, Sept., 1928, 32-41. 
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Recurring Words and Their Relation to Difficulties in Comprehension. Ethel L. 
Fennell. The Elementary School Journal, Sept., 1928, 42-53. An anlysis of the 
different meanings for the same word as well as the recurrence of each word appear- 
ing in primary readers showed that context might be as important a determiner 
of difficulty as recurrence. 

A Study of the Vocabulary of Ancient History Texts. C.G. Shambaugh, School 
and Society, Oct. 20, 1928, 494-496. Five textbooks for ninth grade pupils were 
examined. From seventeen to thirty-seven per cent of the total number of differ- 
ent words in each text were not found in the Thorndike Word List. ‘‘ More than 
half of the uncommon words do not appear in any one text more than once.”’ 

Influence of Type Form on Speed of Reading. Miles A. Tinker and Donald G. 
Patterson. The Journal of Applied Psychology, Aug., 1928, 359-368. Using the 
Chapman Cook Speed of Reading Test with six hundred forty students, “lower 
case letters’’ prove definitely superior in rate to “all capitals”’ and slightly superior 
to “italics.” 

CHARACTER AND PERSONALITY TRAITS 


Some Observations Concerning the Reliability of the Pressey X-O Test. Lorin A- 
Thompson, Jr. and H. H. Remmers. The Journal of Applied Psychology, Oct., 
1928, 477-494. A group of three hundred six university students were adminis- 
tered the Pressey X-O Form A. A response count was used to determine sex 
differences. Correlations between total affectivity score and (a) psychology grades 
was .19, (b) Otis Higher, .07, (c) freshman academic grades, .07, (d) Morgan’s 
mental test, .04; between total idiosyncrasy score and (a) .147, (b) .169, (c) .61, 
(d) .08 

An Experimental Study of Temperament. David W. Oates. The British 
Journal of Psychology, July, 1928, 1-30. Records on a group of fifty boys in a 
Secondary School were secured with (a) the Downey Will Temperament Test, (b) 
ten group intelligence tests, and (c) on school examination marks. The writer 
concludes that ‘‘The factor or factors causing the correlation between intelligence 
and scholastic ability do not enter into temperament. Scholastic ability depends 
upon intelligence and temperament which are apparently independent of one 
another.”’ 

Overstatement in Third-grade Children. Herbert Woodrow and Violet Bemmels. 
The Journal of Applied Psychology, Aug., 1928, 404-416. A group of two hundred 
seventy-one children, CA from 7-0 to 11-6 in grade III, were given an overstate- 
ment test (corrected reliability of .71). Correlations were secured as follows: 
Between Overstatement test and (a) MA (Otis Group Primary), .29; (b) teachers’ 
character rankings, .05; (c) achievement (school marks), .39. 

The Method of Selecting the Members of the High-school Honor Society. William 
C. Reavis. The School Review, June, 1928, 423-429. A study of ratings regard- 
ing school citizenship traits. 


EDUCATIONAL PSYCHOLOGY 


The Significance of Unambiguous Evidence Regarding Environmental Influ- 
ences. William C. Bagley. Educational Administration and Supervision, Oct., 
1928, 441-450. A criticism of articles in and papers on the Twenty-seventh Year- 
book of the National Society for the Study of Education. 
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Memory. John A. McGeoch. Psychological Bulletin, Sept., 1928, 513-549. 
A review of experimental studies. 

Experimental Studies of Thought and Reasoning. Carroll C. Pratt. Psy- 
chological Bulletin, Sept., 1928, 550-561. A review of studies. 

The Effect of Continuous Work upon Output and Feelings. A. T. Poffenberger. 
The Journal of Applied Psychology, Oct., 1928, 459-467. A group of ten to thir- 
teen college subjects were practiced for 544 hours in (a) adding two-place numbers, 
(b) sentence completion, (c) judging compositions, and (d) Thorndike Intelligence 
Examination for High School Graduates. The losses from one half-hour period 
to the next were small. ‘No positive relation between changes in output work in 
a variety of activities and changes in the feelings’’ was found. 

Resemblance in the Handwriting of Twins and Siblings. Emily Kramer and 
Charles E. Lauterbach. Journal of Educational Research, Sept., 1928, 149-152. 
Data collected for two hundred five pairs of twins and one hundred one pairs of 
siblings reveal higher correlations for the former. 


ADMINISTRATION AND SUPERVISION 


Scholastic Accomplishment inthe Junior High School. F.C. Landsittel. Journal 
of Educational Research, Sept., 1928, 127-135. A comparison of three hundred 
seventy-one pairs of university freshmen in intelligence, CA, senior high school 
grades, freshman grades, science, mathematics, and languages, revealed the follow- 
ing conclusion: ‘‘ The general showing is not to the credit of the junior-senior high 
school (in comparison with the 8-4 high school); every really significant different 
is against it.” 

Waste in Professional Education. Richard E. Hyde. Journal of Educational 
Research, Sept., 1928, 144-148. In a course in educational psychology, the 
writer discovers that eighteen per cent of the freshmen and twelve per cent of the 
upper classmen had a passing knowledge of the subject-matter at the time they 
enrolled. 

A Comparative Study of Two Groups of Teachers College Students. John R. 
McCrory. Educational Administration and Supervision, Oct., 1928, 469-475. 
Students intending to pursue a one-year course in comparison with those intending 
to take the two-year course were (a) slightly less intelligent by Otis SA Higher, 


- (b) slightly poorer in high school grades and in college marks (c) considerably older 


chronologically. 

A Comparison of Letter Boys and Non-letter Boys in a City High School. William 
A. Cook and Mabel Thompson. The School Review, May, 1928, 350-358. Little 
or no difference in scholarship between the two groups was found. 

One Subject at a Time. Willis Thompson. The School Review, Sept., 1928, 
541-546. A comparative study with small groups in high school of the advan- 
tages of studying one subject (algebra) all day in place of four different subjects 
each day. No reliable differences were obtained. 

Occupational Destination of Ph.D. Recipients. M. E. Haggerty. The Educa- 
tional Record, Oct., 1928, 209-218. The study shows that but a small fraction of 
the staff of a number of large universities was specifically assigned to research work. 

The Time and the Personnel Available for Administrative Duties in Secondary 
Schools.. W. C. Reavis and Robert Woellner. The School Review, Oct., 1928, 
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576-592. Tabulated results of a check list from five hundred twenty-two second- 
ary school principals. 

Factors Conditioning the Success of School Surveys. R.E. Garlin. School and 
Society, Sept. 15, 1928, 337-340. The survey publicity experiences of twenty-one 
states and of forty-eight cities were collected through a questionnaire. The 
factors were analyzed into those contributing to success and those contributing to 
failure. 


Cuitp PsycHoLoGay 


The Child Who is a Misfit. Ivan A. Booker. The Elementary School Journal, 
Oct., 1928, 140-146. The advancement of two mentally retarded boys to junior 
high school status netted favorable returns. 

A Study of Play in Relation to Intelligence. Harvey C. Lehman. The Journal 
of Applied Psychology, Aug., 1928, 369-397. An analysis of the results from the 
Lehman Play Quiz administered to 6000 children in Grades III to XI showed that, 
in comparison with dull pupils, bright pupils engaged in (a) fewer activities of a 
motor type, (b) more activities which require reading, (c) fewer religious activities, 
(d) more activities involving sense of humor, (e) fewer activities of a social nature. 

Observation and Training of Fundamental Habits in Young Children. E. A. Bott, 
W. E. Blatz, Nellie Chant and Helen Bott. Genetic Psychology Monographs, 
Vol. IV, No. 1, July, 1928. An objective investigation of the sleeping habits, 
play activities, eating habits, etc. of a small group of nursery school children. 

Recognition of Fatigue in the School Child. Max Seham. The Elementary 
School Journal, Oct., 1928, 106-113. 


EDUCATIONAL AND VOCATIONAL GUIDANCE 


Factors Affecting the Success of College Freshmen. Robert M. Bear. The 
Journal of Applied Psychology, Oct., 1928, 517-523. Freshmen classes, one 
hundred seventy-two subjects, were rated by Otis Higher Examination and by 
academic averages. Tables show differences in the two ratings which were associ- 
ated (a) with father’s occupation, (b) with type of course entered, (c) with partici- 
pation in athletics, (d) with home state or section of states, and (e) with CA. 

Extra-curriculum Activities and Academic Work. Albert Beecher Crawford. 
The Personnel Journal, Aug., 1928, 121-129. For 2643 undergraduates at Yale 
University, this study reports the number of cases, scholastic averages, mental 
test ratings, correlations between mental rating and grades, mean time reported 
spent in study per week, and mean time reported spent in extra-curriculum activi- 
ties per week, separately for each extra-curriculum activity; and furthermore, 
compares in these factors, students athletic with non-athletic, and those engaged 
in activities with non-engaged. 

A Study of the Load of Senior High School Pupils in Los Angeles. UHildur C 
Osterberg. The School Review, May, 1928, 359-369. Low correlations were 
obtained between study time and (a) intelligence quotient or (b) school marks. 
The study included statistics on activity load, and on the load of failing pupils. 

Occupational Interests of College Women. Esther Allen Gaw. The Personnel 
Journal, Aug., 1928, 111-114. The Freyd Occupational Interest Blank for 
Women was given to one hundred ninty-one freshmen students, and a year later 
to one hundred five of these same students as sophomores. Correlation between 
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means of items for the two testings was practically 1.00; correlations between 
freshman choice and sophomore choice was .40 to .61 for the middle fifty per cent 
of the group. 

A Revision of the Chapman-Sims Socio-economic Scale. J.D. Heilman. Jour- 
nal of Education Research, Sept., 1928, 117-126. Records of six hundred eighty- 
eight homes were used in the making of a scale suitable for use in Denver. 

Mental Ability with Reference to Selection and Retention of College Students. 
F. P. O’Brien. Journal of Educational Research, Sept., 1928, 136-143. From a 
study of high school graduates and of college entrants, the author concludes that 
it appears from the facts at hand that the colleges are really successful neither in 
attracting nor in holding the mentally fit.” 

Comparative Vocational Success in a Trade School and in Industry. Orlie M. 
Clem and H. S. Bennett. The School Review, Ma, 1928, 380-387. A study of 
two hundred thirty-two boys embodying (a) reasous ‘or leaving trade school to 
go to work, (b) trade selected, (c) earnings each year, first to fifth, after leaving 
trade school of these three groups: one year, two years, three to four years. 

Mental Ratings, Scholarship and Health. Harry M. Keal. School and Society, 
Sept. 1, 1928, 277-280. From a study of high school students, the writer presents 
evidence to support the conclusion “that health (especially of pupils with low men- 
tal ratings) is the greatest single factor governing success in school, and that 
mental ratings which do not take physical condition into consideration are not a 
true index of the learning ability of high-school pupils.” 

Student Opinion at Syracuse. Daniel Katz. The Personnl Journal, Aug., 
1928, 103-110. The questionnaire covered the following points: Reasons for com- 
ing to college, rating of college activities, methods of instruction, academic free- 
dom, attitudes toward scholarship, attitudes on cribbing, adequacy and fairness 
of grading, sex segregation activities, coeducation, religious activities, nature of 
belief in the Deity, fraternities, student self-government, and compulsory military 
training. 

Where Does He Rank? Edgar M. Finck. The School Review, June, 1928, 
455-464. This report includes (a) questionnaire results of ranking methods used 
in New Jersey and (b) a study of intercorrelations (N, 33) between these factors: 
IQ, all work above Grade VIII, all work above Grade IX, all work of last four 
years, all work of the last three years, etc. As a result of his findings, the writer 
emphatically advises the discard of the IQ for scholarship ratings. 


MEASUREMENT OF SPECIAL ABILITIES 


A Note on the Validity of a Test of Social Intelligence. M. Eustace Broom. The 
Journal of Applied Psychology ,Aug., 1928, 426-428. A raw correlation between 
the Moss ‘‘Social Intelligence Test’’ and the Thorndike Intelligence Examination 
of .56 is reported for two hundred fifty-eight students entering a teachers college. 

The Reliability and Validity of the Seashore Tests of Musical Talent. A. W. 
Brown. The Journal of Applied Psychology, Oct., 1928, 468-476. With about 
ninety subjects over an interval of four months, reliability coefficients on subtests 
ranged from .29 to .71. Correlations on subtests with teacher judgment, from .11 
to .41; correlation of total Seashore test with teacher’s rankings was .38; with CA, 
—.34; with intelligence ratings, .24. 
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Stupy Hasirs AND METHODS 


The Permanent Effects of Training in Methods of Study on College Success. 
Luella Cole Pressey. School and Society, Sept. 29, 1928, 403-404. From two 
groups of fifty probation college students each, one a control and the other an 
experimental group, the writer concludes “ (a) that it is not worth while to train 
students below the twenty-fifth percentile in intelligence . . . (b) Students 
who have enough initial ability to learn to study, those above the twenty-fifth 
percentile, are almost certain to be ‘saved.’” 

Another Attempt to Teach How to Study. Ruth Strang. School and Society, 
Oct. 13, 1928, 461-466. Tests of knowledge of study habits, of ability to listen to 
a lecture, of ability to answer questions on the material read, of ability to get a 
general idea of a paragraph, and of ability to get main points in logical relationship 
are combined with personal data and class exercises in ‘‘how to study” in an 
attempt to solve this problem. 


MISCELLANEOUS 


The Importance of Listening Ability. Paul T. Rankin. The English Journal, 
Oct., 1928, 623-630. ‘‘The average per cent of waking time devoted to each form 
of communication for a total of sixty days as recorded by twenty-one different 
people” was 31.9 for talking; 11 for writing; 42.1 for listening; and 15 for reading. 
School emphasis is upon written expression and understanding, while life use 
distinctly favors oral expression and understanding. 

Research High Spots in Physical Education. James Edward Rogers. American 
Physical Education Review, Sept., 1928, 443-447. 

Methods of College Teaching. C. R. Wiseman. School and Society, Oct. 6, 
1928, 433-434. Returns from one hundred fifty college upperclassmen show 
student preferences regarding content of course, instructor or student activity in 
class, use of class time, and testing. 

Effect of Errors of Measurement on the Difference between Groups. C. L. Huf- 
faker. The Journal of Comparative Psychology, Oct., 1928, 313-315. A correc- 
tive formula is suggested. 

Sex Development in Apes. Harold C. Bingham. Comparative Psychology 
Monographs, Vo. V, No. 1, May, 1928. 
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A SATISFYING TEXTBOOK IN OBJECTIVE PsycHOLOGY 


Fundamentals of Objective Psychology, by John Frederick Dashiell. 
New York: Houghton Mifflin Co., 1928. Pp. XVIII + 588. 


Textbooks in general psychology issued during the last fifteen years 
show a cumulative shift toward the use of objective methods and a 
tendency to desert such topics as the analysis of sensation for problems 
that have more general interest. This change in program has involved 
the authors of the more radically objective texts in two difficulties. 
Some authors have found it necessary to introduce so many new terms 
that their students find themselves speaking a language understood 
only in their own university. And some authors have been so pre- 
occupied with method that they have had no space or leisure for the 
mass of detailed imformation that gives a course in psychology its real 
interest. 

Dashiell’s new text has surmounted these difficulties. Its language 
is so well considered and conservative that its radical and illuminating 
explanations can be understood without a special lexicon. It contains 
fully twice as much account of psychological experiment and observa- 
tion as is offered by any other elementary text. In addition, it marks 
decided progress in clarifying and systematizing objective psychology. 

The chapter titles give some insight into the nature of the book. 
After two short introductory chapters on method (illustrated by four 
sample problems) and on the field of behavior, three chapters (102 
pages) are devoted to the analysis of behavior and the physiological 
basis of behavior. The seventh chapter gives an account of “reflexes 
and the Integration of Action Units.”’ devoting twelve pages to the 
results of experimental work on the conditioned reflex. A chapter 
on “Native Reaction Patterns” reviews the findings on instinct in 
animals and the Hopkins observations of infant behavior, agreeing for 
the most part with the treatment of instinct given by Watson and by 


Allport. Of the emotions it is said that their common names “refer 
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to differences in viscerally reenforced or inhibited overt behavior pat- 
terns that have been classified and labeled in terms of their social signifi- 
cance rather than in terms of their visceral components.’”’ The chapter 
on ‘‘ Motivation” (46 pages) is the best in the book. Three types of 
motivation, hunger, sex, and unfavorable skin conditions are subjected 
to an analysis with the concluding generalization that “the funda- 
mental part of the drive is to be traced to certain tissue conditions of 
the organism that set up afferent neural impulses passing to and 
through nerve centers and out to effectors, . . . exciting them to 
excess activity; and that the adequate external stimulus, such as food 
or mate, serves here as a trigger or release for directing some of the 
reactions (especially the consummatory ones) by providing the neces- 
sary environmental opportunity for their full appearance... ” 
Attention is dealt with in a chapter on “ Postural Responses, ’”’ in which 
thinking is described as ‘‘a process of self-stimulating and responding.”’ 

The conditioned response is offered as ‘‘the type and the elementary 
complete unit of all learning.’ redintegration being explained in terms 
of proprioceptive conditioning. Perception, defined as adjustment to 
larger wholes of which the actual stimuli are only a part or sign, is 
subjected to some well-conceived analyses. Its chapter includes a 
good section on the perception of the beautiful. 

The remainder of the book is devoted to chapters on “Social 
Behavior,” Discriminating and Generalizing,” ‘‘Thinking,’’ and 
“Personality.’”’ Thinking is explained in terms of substitute reactions 
directed by postural mechanisms. 

Throughout the book illustrations and examples have been used 
freely and with good effect. Psychology is presented as a natural 
science based on observation and experiment. There are included 
one hundred ten figures and diagrams, a refreshing proportion of these 
being new. In the reviewer’s opinion this text is by far the best so 
far written and is apt to remain the best for some time. 


E. R. GuTuRIie. 
University of Washington. 





PRACTICAL EDUCATIONAL PSYCHOLOGY 


Educational Psychology, by A. M. Jordan. New York: Henry Holt and 
Co., 1928. Pp. 460. 


“In preparing this text the author has ever kept in mind the 
use to which the material is to be put,”’ states Jordan in his preface. 
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This seems to be the key to the differences which we find between this 
Educational Psychology and many other texts in the same field. The 
book does not present a systematic psychology. The treatment of 
instincts, for example, is sketchy and obviously directed to the school 
room situation. Tendencies which seem to be of importance in educa- 
tion are discussed; others are dismissed with short notice. Many will 
consider this a desirable characteristic when the book is to be used with 
students who have recently completed the elementary psychology 
course. It saves time, and keeps the treatment close to the subject. 

The only significant new contribution in the work seems to be in 
the discussion of learning, which is developed from an analysis of the 
problem-solving situation rather than from associative laws. The 
recognition of the problem nature of school situations is early impressed 
on the beginner, and the nostrum of using over-simplified laws of 
learning for all explanations is avoided. 

The organization of the text for class use has both merits and diffi- 
culties. The captions and guides for learning are not quite adequate— 
a more generous use of italics and headings would be an improvement. 
The chapter on statistics is introduced without sufficient motivation, 
and seems unnecessarily difficult for students using the book in the 
given chapter order. The illustrative material is, in general, excellent, 
though some of the statistical treatments stand in need of much pro- 
fessorial interpretation. 

The reactions of a group of students who used the book as a text 
last summer might be of interest. About half of them considered Jor- 
dan ‘“‘harder reading” than three other well known texts used for 
collateral reference. The discussion of transfer in school subjects was 
considered most difficult, and the excellent section on maladjusted 
children most interesting and profitable. A great majority of the 
class considered that they had profited by the use of the exercises 
provided at the ends of the chapters. 

A wide use may be predicted for Jordan by instructors who have a 
practical bias, and who do not fear statistics. 

LAURANCE F. SHAFFER. 


Carnegie Institute of Technology 
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A Livety TEextT ON PERSONALITY 


The Psychology of Personality: An Analysis of Common Emotional 


Disorders, by English Bagby. New York: Henry Holt and Co., 
1928. Pp. 236. 


There never can be too great a supply of lively, readable books in 
the collateral fields of psychology. Bagby’s “Psychology of Per- 
sonality” is such a specimen. It seems interesting to persons of all 
degrees of training in psychology beyond a rather low minimum, and 
snould be helpful to the teacher and student in better understanding 
their own conduct. 

Bagby defines his “‘ personality” as “‘persistant traits of emotional 
conduct and thinking.”’ The treatment is sound, reasonably sys- 
tematic, and not inspirational and preachy. The principal unique 
concept is that of ‘‘tension-reduction.”’ An emotional situation is 
considered as creating an organic state of tension the reduction of which 
coisiitutes the response. The demonstration that the undesirable 
forms of adjustment, “‘maladjustment mechanisms, ”’ are tension-reduc- 
ing devices helps in understanding and eliminating them. 

Much illustrative material, drawn largely from problems of young 
people of the age of undergraduate students, is included. 


LAURANCE F. SHAFFER. 
Carnegie Institute of Technology. 





PRE-SCHOOL EDUCATION 


Children in the Nursery School, by Harriet M. Johnson. New York: 
The John Day Company, 1928. Pp. XX + 319. 


“Children in the Nursery School”’ is the record of eight years of 
experience in an experimental situation set up by the Bureau of 
Educational Experiment, already well known through the work of its 
ally, the City and Country School. In this volume Miss Johnson tries 
to give a picture of the lives of children in the Nursery School, and 
something of her philosophy and procedure concerning early childhood 
education, at least as far as she has been able to develop these. 

Basing her opinions upon recorded observations which have been 
made throughout the entire period, the author is seldom dogmatic 
about her policies, even though they differ radically in many respects 
from those of most nursery schools. She does not agree with those 
who would regard nursery schools as high-grade, well-organized 
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parking places meeting chiefly the physical needs of little children. 
Neither does she approve making of the nursery school a preparatory 
institution for kindergarten and later elementary education. It is 
the here and now life of the child as he is between the ages of fourteen 
and thirty-six months which is significant, and with an extraordinary 
amount of restraint, she and her co-workers have set themselves the 
task of watchful observation rather than directive interference. 

In a preliminary section, “Why We Do What We Do,”’ is set forth 
the general educational point of view in regard to the children’s 
impulses to action and experimentation. Methods of dealing with the 
routine habits and conventions having to do with health and the 
social amenities are sketched, and the schedule and daily program, or 
rather lack of it, is accounted for. 

The second section describes the planned environment—the 
the physical, consisting of activities and materials; the social, being the 
child’s contact with other children and adults; and the part which 
language and rhythms play at this early growth stage. Miss Johnson 
maintains that at this age language should not be dealt with primarily 
as a tool for the communication of thought but as a form of motor and 
sensory experience in the use of the vocal apparatus. She would 
stimulate and keep alive the child’s impulse to use his muscles, includ- 
ing the speech mechanism, rather than encouraging the early adoption 
of sophisticated language forms which can mean nothing to the child 
of thisage. Baby talk is not indulged in, neither is the attempt made 
to teach children to speak by calling their attention to names of things 
about them. In this respect the nursery school has much of wrong 
home training to undo or re-direct; hence, it is highly desirable that 
more extensive investigation be made of this phase of early develop- 
ment. Miss Johnson’s close coordination of early music, language, 
and rhythm likewise suggests further investigations. 

The third section describing records and record-keeping will be of 
special interest to all concerned with the clinical study of childhood. 
This reviewer regrets that it was not possible to incorporate in the 
present volume more extensive and coherent records of the observa- 
tions made regarding the children’s use of their environment. We 
should pass on from literature on how to keep records to the stage of 
actually collecting voluminous records of real children in real situations, 
if the psychological study of infancy and young childhood is to be 
furthered. The author’s use of her recorded observations as source 
material by which to check her statements regarding facts of growth 
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has been excellent. But for psychology such source material is more 
greatly needed and more valuable than the generalizations which are 
drawn from it. 

This book challenges much of contemporary nursery school 
practice, the treatment of the young child in the average home, and 
as such will be welcomed by all who are sincerely seeking for a better 
understanding of the psychology of infant growth. 


ANN SHUMAKER. 
The Lincoln School, New York City. 





Two CoNTRIBUTIONS IN THE FIELD OF MEASUREMENT TO SCHOOL 
ADMINISTRATION 


An Evaluation of the Use of Certain Educational and Mental 
Measurements for Purposes of Classification, by Arthur D. Hollings- 
head. New York: Teachers College, Columbia University, 
Contributions to Education, No. 302, 1928. Pp. IX + 63. 

The Improvement of Measurement through Cumulative Testing, by 
Novel Keys. New York: Teachers College, Columbia Univer- 
sity, Contributions to Education, No. 321, 1928. Pp. VIII + 81. 


It is, indeed, commendable that more and more research is being 
done in solving the practical problems of educational administration by 
scientific method. Not only is this tendency wholesome for the 
practical field, but also it keeps research intimately connected with 
utility. An interplay between the field and the research laboratory 
is a goal which should prove most fruitful. 

The two investigations mentioned in this article illustrate remark- 
ably well a happy combination of field and research endeavor. They 
attempt to answer specifically two problems which the progressive 
administrator faces: First, on what bases should a large group of 
children be sectioned in order that all of the pupils in each class will 
proceed throughout a semester at about the same rate in reading, in 
arithmetic, etc.? Second, to increase the reliability of educational 
and vocational guidance, how may school measurement records be 
combined optimally? 

The first of these reports deals with four hundred twenty-five 
pupils who were tested with the National Intelligence Test and the 
Stanford Achievement Test both in Grade V and again in Grade VI. 
On the basis of these testings, tables were constructed showing the 
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means an standard deviations in educational age and arithmetic 
age resulting from sectioning on the basis of Mental age. Similar 
tables presented the distributions obtained from classifying by educa- 
tional age and by arithmetic age. These calculations tended to show 
that pupils homogeneous in one characteristic were not homogeneous 
in another. The author concludes that educational age is the best 
single basis upon which to classify pupils. 

The implications from this study might be stated thus: That 
educational tests should supplant mental tests for sectioning a group 
in terms of educational achievement. 

The second of these investigations represents a more ingenious 
use of statistical devices and a more thorough exploitation of material. 
Consequently, the conclusions are more detailed and valid. This 
study bears this subtitle: “‘An empirical study of two hundred ele- 
mentary school children over a period of four years.’”’ A group of 
Grade VIII pupils with an average IQ of 98 to 100 were tested with 
the National Intelligence Test, a spelling test, Thorndike-McCall 
Reading, New York Geography, Woody-McCall Arithmetic, and Otis 
Arithmetic. Intercorrelations between the tests for the same year 
and for different years are presented. 

Two important conclusions are forthcoming: That brightness 
scores prove slightly more reliable than do raw scores, T scores, and 
quotients in prediction; that the averaging of scores on tests given 
three years previous or earlier is inadvisable, the correlations between 
more recent tests being the higher. 

The first of these dissertations should prove especially helpful 
to the school administrator. The second should find a place in the 
library of every one who is interested in research as well as 
administration. JAMES E. MENDENHALL. 

The Lincoln School, Teachers College, Columbia University. 
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Tests and Questionnaires 


Haines’ Alertness Test. An adaptation 
of the Army Alpha used in surveys 
by the National Committee for 

ental Hygiene. 

Scotr’s Mental Alertness Test. Devised 
for commercial and industrial use. 
Used in ive establishments 
all over nited States. 

BIxLer’s Test 4 apenas San 

ce test for Junior Employment 
oad Vocational Guidance Bureaus. 

Ropack’s Superior Adult Test. Very 
effective ios detecting superior men- 
tality. ially valuable where 
sharp discrimination is desirable. 

THuRSTONE’s Intelligence Test. Devised 
for high school seniors and college 
freshmen. One of the best college 
entrance examinations. 





mens hy See of Sonat ny wen 

ty. Indispensable for determin- 
ing mechanical aptitude. 

I. E. R. Assembly Test for Girls. For 
det ability to deal with 

and mechanisms. 

PressEy’s Tests for Investigating the 
Emotions. Adolescent and adult 
forms. 

WoopwortH, Woodworth-Mathews, and 
Woodworth-House Personal Data 
Sheets for obtaining a measurement 
of the general emotionality, nervous 
and mental stability of pre-adoles- 
cents, adolescents, college students, 
and adults. 

THURSTONE’S Data Sheet. A labor-saving 
device for facilitating the computa- 
tion of the Pearson tion 
coefficient. 
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‘Recently Published 


The American Normal School: Its Rise and 
Development in Massachusetts, by 
Vernon L. Mangun. With an Intrc- 
duction by William C. Bagley. $3.50 + 
12¢ post. 


Is his Introduction Professor Bagley says: 

‘“*In the deliberations, debates, and contro- 
versies that resulted in the firm establish- 
ment of the Massachusetts State normal 
schools, a momentous issue was at stake;— 
nothing less, indeed, than the basic ques- 
tion, Can and will a self-governing people 
provide the conditions which alone insure 
the success and perpetuation of democratic 
institutions? . . . It would be hard to 
overestimate the significance of the decision 


that Massachusetts finally reached in 
respect of the normal schools. In its far- 
reaching influence, this decision transcended 
both state and national boundaries. The 
failure of Massachusetts at this juncture, 
as Henry Barnard said, ‘would have 
changed the whole condition of public 
instruction in this country for a half 
century, if not forever.’ In the present 
volume, Dr. Mangun narrates the story of 
this achievement. He has done his work 
with metriculous regard for the historic 
proprieties and yet not without a keen and 
thoroughly legitimate sensitiveness to the 
dramatic qualities that give to the real 
crises of social evolution their fundamental 
human meaning.” 


Sh] 


State Control of Secondary Education, by 
O. L. Troxel. $2.50 + 10¢ post. 
Ts study concerns itself with the 

elements of state control over public 
secondary education as revealed through 
an analysis of state laws and also the regu- 
lations of state departments of education 
governing high schools. The application 
of these elements of control are presented 
through a study of the methods of enforce- 
ment of control in the various states. 
Objective evidence of such enforcement is 
presented through an analysis of reports 
which the state departments of education 
require the high schools to submit and 
such other miscellaneous objective evidence 
as is available from the various states. The 
results of control, and an evaluation of the 
types and methods of control, are essayed. 
The study makes plain the need for stand- 
ardizing the various elements of —_ in 
order that it will become increasingly less 
difficult to make a comparative study of the 
control of secondary education in the 
United States. 


Contemporary Municipal Government of 
Germany, by Bertram W. Maxwell. 
$2.50 + 10¢. 

PROFESSOR Maxwe tt has made a study 

of municipal government in Germany 
since the Revolution in 1918. He describes 
the most important and significant changes 
brought about by the Revolution in the 
administration of German cities. While 
the organization indicated is by no means 
in its final form, since there is a continu- 
ous mutation and development going on, 
adjusting agencies to new conditions as 
they arise, the general trend is clearly 
revealed in changes that have already been 
recorded. Whereas the changes have been 
many, yet as a whole it is found in this 
instance, as in other violent governmental 
changes of history, local government has 
not been affected to the same extent as the 
national government. Changes tending 
toward democracy are, however, noticeable 
in all the new municipal codes of Germany: 
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